JP3579843B2

JP3579843B2 - Digital signal processor

Info

Publication number: JP3579843B2
Application number: JP28442594A
Authority: JP
Inventors: 実高木; 義則松下
Original assignee: 日本テキサス・インスツルメンツ株式会社
Priority date: 1994-10-24
Filing date: 1994-10-24
Publication date: 2004-10-20
Anticipated expiration: 2019-10-20
Also published as: US5822613A; JPH08123682A

Description

【０００１】
【産業上の利用分野】
本発明は、パイプライン方式のディジタル信号処理装置に関する。
【０００２】
【従来の技術】
従来より、ディジタルフィルタ、ディジタル自動等化、高速フーリエ変換（ＦＦＴ）等のように数多くの積和演算を扱うディジタル信号処理にＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）が用いられている。一般にＤＳＰは、高速の積和演算処理を実現するために、高速乗算器、加算器、プログラム用メモリ、データ用メモリ等を内蔵し、パイプライン処理を行えるマイクロプログラム制御またはＰＬＡ制御型のマイクロプロセッサとして構成されている。また、ＤＳＰは、入出力機能も備えており、記憶するデータが多い場合には、入出力インタフェースを介して外部の補助メモリにデータを蓄えるようにしている。
【０００３】
上記のような外部メモリにＤＳＰが随時アクセスできるようにするために、一般的には図８に示すようなシステム構成が採られている。このＤＳＰシステムでは、ＤＳＰ１００内の演算制御部１０２が外部メモリ１０４にアドレス情報を直接与え、書き込み時には演算制御部１０２からの書き込みデータがデータバス１０６および入出力ポート１０８を介して外部メモリ１０４に送られ、読み出し時には外部メモリ１０４からの読み出しデータが入出力ポート１０８およびデータバス１０６を介して演算制御部１０２に送られる。
【０００４】
しかし、この方式は、データ転送が完了するまで演算制御部１０２内の各部は次のステップに移れないため、パイプライン処理がその間ホールド状態になるという不利点がある。実際、演算制御部１０２が内部メモリ１１０にアクセスするときと比べて、外部メモリ１０４へのアクセスに要する時間は長いのが普通である。
【０００５】
たとえば、音場再生、音場補償等のオーディオ・ディジタル信号処理では、ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）からのオーディオデータが１６ビットのデータ長を有している。このようなオーディオデータを１６ビットのデータ長のままで外部メモリ１０４との間で入出力するとなると、入出力ピンの個数の増大および外部メモリデバイスの個数の増大等を来し、ハードウェアコストが相当高くなってしまう。これは、反射音の処理等のように大量の遅延データを扱う場合に特に顕著となる。このため、外部メモリ１０４との間の入出力ビット数を少なくし、入出力回数を多くすることにより、全体のハードウェアコストを下げるようにしている。しかし、その代償として、外部メモリ１０４へのアクセス時間が長くなっており、それだけパイプラインがホールドされる時間も長く、ＤＳＰの処理能率が低くなるという問題を起こしている。
【０００６】
図９は、上記の問題を解決する方式として従来より採られているＤＳＰシステムの構成を示す。この方式では、ＤＳＰ１００’内に、外部メモリ１０４に対してデータの書き込みおよび読み出しを行える外部メモリコントローラ１１２を設けている。この外部メモリコントローラ１１２は、アドレス情報を一時的に保持するためのアドレスレジスタ１１２ａと、データを一時的に保持するためのデータレジスタ１１２ｂとを有している。
【０００７】
演算制御部１０２が外部メモリ１０４にデータを書き込むとき、演算制御部１０２はアドレス情報およびデータをデータバス１０６を介して外部メモリコントローラ１１２に転送するだけでよく、これで演算制御部１０２の各部が次のステップに移れる。一方、外部メモリコントローラ１１２は、演算制御部１２４からのアドレス情報およびデータをそれぞれアドレスレジスタ１１２ａおよびデータレジスタ１１２ｂに保持し、外部メモリ１１２にアクセスして当該アドレス情報で指定されるメモリ番地に当該データを書き込む。この書き込みは、予め規定されたサイクルで行われる。
【０００８】
演算制御部１０２が外部メモリ１０４よりデータを読み出すとき、演算制御部１０２はアドレス情報をデータバス１０６を介して外部メモリコントローラ１１２に転送する。外部メモリコントローラ１１２が外部メモリ１０４よりデータを読み出すときにも、予め規定されたサイクルを要する。外部メモリコントローラ１１２において、アドレスレジスタ１１２ａに演算制御部１０２からのアドレス情報がロードされた時、データレジスタ１１２ｂには前回のメモリサイクルで外部メモリ１０４から読み出されたデータが格納されている。したがって、演算制御部１０２は今回のアドレス情報を外部メモリコントローラ１１２に送ると同時に外部メモリコントローラ１１２から前回のデータを受け取ることができ、演算制御部１０２の各部が直ちに次のステップに移れる。
【０００９】
このように、外部メモリコントローラ１１２が外部メモリ１０４に対するデータの書き込みおよび読み出しを実行するため、演算制御部１０２は外部メモリコントローラ１１２との間でデータバス１０６を介してアドレス情報およびデータの転送を行えばよく、パイプライン処理をホールドしなくて済む。
【００１０】
【発明が解決しようとする課題】
上記した図９のＤＳＰシステムは、外部メモリ１０４をアクセスする際に全体のパイプライン処理をホールドしなくて済むという限りでは、演算処理の高速性を担保している。しかしながら、演算制御部１０２が外部メモリコントローラ１１２にアクセスする間はデータバス１０６が使用中となるため、演算制御部１０２内で演算処理を行うことはできない。つまり、外部メモリ１０４を１回アクセスすることで演算制御部１０２における演算処理を１回減らしている。上記したような音場再生等のオーディオ・ディジタル信号処理では、オーディオ信号のサンプリング周波数で規定される一定時間内にどれだけ多くの積和演算を実行できるかでＤＳＰの性能が決まる。この従来システムでは、外部メモリ１０４からデータを読み出す命令の実行によって演算制御部１０２における演算処理回数が減少するため、ＤＳＰの性能を十分に引き出せないという問題があった。
【００１１】
本発明は、かかる問題点に鑑みてなされたもので、演算処理効率を損なうことなく外部メモリよりデータを読み込めるようにして、パイプライン処理の高速性を確保すると同時に単位時間当たりの演算処理回数を可及的に多くし、処理能力を向上させたディジタル信号処理装置を提供することを目的とする。
【００１２】
【課題を解決するための手段】
上記の目的を達成するために、本発明のディジタル信号処理装置は、パイプライン方式で一連の命令を実行してディジタル信号を処理するディジタル信号処理装置において、異なるデータを同時に転送できるようになされた第１、第２および第３のバスと、前記第１のバスに接続された第１の内部メモリと、前記第２のバスに接続された第２の内部メモリと、前記第１および第２のバスに接続された演算手段と、前記第３のバスに接続されるとともに、少なくとも前記第１および第２のバスの一方に接続された第３の内部メモリと、前記第３のバスに接続されるとともに、少なくとも前記第１および第２のバスの一方に接続され、かつ外部メモリにデータの書き込みおよび読み出しを行える入出力インタフェース手段とを有し、所定の１つの命令実行サイクルの間に、前記第１および第２のバスを使用する第１の命令と、前記第３のバスを使用する第２の命令とを並列的に実行するように構成され、前記所定の１つの命令実行サイクルにおいて、前記第１の命令については、前記第１および第２の内部メモリからそれぞれデータが読み出されて、それらの読み出されたデータが前記第１および第２のバスを介して前記演算手段に転送され、次いで前記演算手段で両データについて所定の演算が行われ、前記第２の命令については、所定のアドレス情報が前記第３のバスを介して前記入出力インタフェース手段に送られ、次いで予め前記外部メモリより前記入出力インタフェース手段に読み出されているデータが前記第３のバスを介して前記第３の内部メモリに転送されるように構成されている。
【００１３】
本発明のディジタル信号処理装置において、好ましくは、前記第３の内部メモリには前記外部メモリにアクセスするための前記アドレス情報を格納するための第１のメモリ領域と前記外部メモリより転送されたデータを格納するための第２のメモリ領域とが設定され、各々対応する前記アドレス情報と前記データとはそれぞれが格納される前記第１のメモリ領域におけるメモリ番地と前記第２のメモリ領域におけるメモリ番地との間に一定のオフセットを有してよい。
【００１４】
更に、本発明のディジタル信号処理装置においては、第１の内部メモリと第２の内部メモリの少なくとも一方がデュアルポート型のメモリであってよい。
【００１６】
【作用】
本発明のディジタル信号処理装置では、主として演算処理に用いられる第１および第２のバスの外に、入出力インタフェース手段に接続するデータ転送用の第３のバスを設けている。これにより、１つの並列型命令の命令実行サイクルの中で、第１および第２のバスを用いて演算命令が実行されると同時に、第３のバスを用いて外部メモリからのデータを内部メモリに取り込むことができる。
【００１７】
【実施例】
以下、図１〜図７を参照して本発明の実施例を説明する。
【００１８】
図１は、本発明の一実施例によるオーディオ・ディジタル信号処理用ＤＳＰのシステム構成を示す。このＤＳＰシステムは、互いに独立した３本のデータバス（Ｃ−ＢＵＳ１０，Ｄ−ＢＵＳ１２，Ｇ−ＢＵＳ１４）を有し、これらのバスに各部を図示のように接続してなる。
【００１９】
Ｃ−ＢＵＳ１０には、係数メモリ（Ｃ−ＭＥＭ）１６と、汎用メモリ（Ｇ−ＲＡＭ）２０と、算術論理演算ユニット（ＡＬＵ）２６と、積和演算器（ＭＡＣ）２８と、プログラムメモリ（Ｐ−ＭＥＭ）３２と、ホストインタフェース回路（ＨＯＳＴ−Ｉ／Ｏ）３４とが接続されている。
【００２０】
Ｄ−ＢＵＳ１２には、データメモリ（Ｄ−ＭＥＭ）１８と、汎用メモリ（Ｇ−ＲＡＭ）２０と、外部メモリ用入出力インタフェース回路（ＥＸ−Ｉ／Ｏ）２２と、オーディオ・インタフェース回路（ＡＵ−Ｉ／Ｏ）２４と、算術論理演算ユニット（ＡＬＵ）２６と、積和演算器（ＭＡＣ）２８と、ホストインターフェース回路（ＨＯＳＴ―Ｉ／Ｏ）３４とが接続されている。
【００２１】
Ｇ−ＢＵＳ１４には、汎用メモリ（Ｇ−ＭＥＭ）２０と、外部メモリ用入出力インタフェース回路（ＥＸ−Ｉ／Ｏ）２２と、算術論理演算ユニット（ＡＬＵ）２６とが接続されている。
【００２２】
Ｃ−ＭＥＭ１６、Ｄ−ＭＥＭ１８およびＧ−ＭＥＭ２０は各々ＲＡＭ（ＲａｎｄａｍＡｃｃｅｓｓＭｅｍｏｒｙ）からなる。Ｃ−ＭＥＭ１６には、主として積和演算のための係数データが格納されるとともに、ＥＸ−Ｉ／Ｏ２２に接続されている外部メモリ（図示せず）にアクセスするためのアドレス情報も格納される。Ｄ−ＭＥＭ１８には、積和演算その他の演算に用いるデータ（主としてオーディオデータ）および演算結果のデータが格納される。
【００２３】
Ｇ−ＭＥＭ２０は、通常はＤ−ＭＥＭ１８の拡張メモリとして使用される。音場再生等のように大量の遅延データを扱う場合には、Ｄ−ＭＥＭ１８に入り切れない遅延データがＲＡＭからなる外部メモリに蓄積され、必要な時に後述する所定の命令（ＢＲＤＥ命令）によって外部メモリから遅延データをＧ−ＭＥＭ２０に取り込むようにしている。この場合、Ｇ−ＭＥＭ２０には、外部メモリをアクセスするためのアドレス情報も格納される。なお、Ｇ−ＭＥＭ２０は、Ｃ−ＭＥＭ１６の拡張メモリとしても使用可能であり、必要に応じて係数データを格納することもある。
【００２４】
Ｃ−ＭＥＭ１６、Ｄ−ＭＥＭ１８およびＧ−ＭＥＭ２０には、それぞれアドレス計算を行うためのアドレッシングユニット１７，１９，２１が付いている。
【００２５】
ＥＸ−Ｉ／Ｏ２２は、上記遅延データ蓄積用の外部メモリにも接続され、その外部メモリにアクセスしてデータの書き込みまたは読み出しを行えるメモリ制御機能を有しており、メモリアクセスのアドレス情報を保持するアドレスレジスタと書き込みまたは読み出しデータを保持するデータレジスタを内蔵している。
【００２６】
ＡＵ−Ｉ／Ｏ２４は、本ＤＳＰと外部のディジタル・オーディオ回路との間で
データのやりとりを行うためのインタフェース回路であり、たとえば前段のＣＤ再生回路や次段のディジタルフィルタあるいはＤ／Ａコンバータ等に接続されている。外部回路からオーディオ信号（データ）が入力されるときは、ＡＵ−Ｉ／Ｏ２４内のレジスタに１個のデータが揃うと、後述する制御装置３０に割り込みがかけられ、割込み処理で該データがＤ−ＢＵＳ１２を介してＤ−ＭＥＭ１８に格納されるようになっている。
【００２７】
ＡＬＵ２６は、任意の算術演算および論理演算を行う演算器であり、アキュムレータも内蔵している。ＭＡＣ２８は、専ら積和演算を行う演算器であり、乗算器とアキュムレータを内蔵している。このように２つの演算器（ＡＬＵ２６、ＭＡＣ２８）が備えられているため、たとえばＡＬＵ２６で加算を行いながらＭＡＣ２８で畳み込みを行うというような並列処理が可能となっている。
【００２８】
Ｐ−ＭＥＭ３２はＲＡＭ（ＲａｎｄａｍＡｃｃｅｓｓＭｅｍｏｒｙ）からなり、本ＤＳＰの処理動作を規定するプログラムを格納する。制御装置３０は、Ｐ−ＭＥＭ３２から命令を逐次読み出し、ＰＬＡ（ＰｒｏｇｒａｍＬｏｇｉｃＡｒｒａｙ）制御方式でシステム内のレジスタ、ゲート類（図示せず）を制御し、各部に当該命令を実行させるように機能する。図１では、説明の便宜上、制御バスは図示していない。
【００２９】
ＨＯＳＴ−Ｉ／Ｏ３４は、本ＤＳＰとホストコントローラ（図示せず）との間でプログラムやデータをやりとりするためのインタフェース回路であり、Ｃ−ＢＵＳ１０とはパラレルポートで接続され、ホストコントローラとはシリアルポートで接続されている。Ｐ−ＭＥＭ３２に格納されるプログラム、Ｃ−ＭＥＭ１６に格納される係数データおよびアドレス情報、Ｇ−ＭＥＭ２０に格納されるアドレス情報は、ホストコントローラより与えられ、ＨＯＳＴ−Ｉ／Ｏ３４からＣ−ＢＵＳ１０を介して各メモリにダウンロードされる。なお、Ｐ−ＭＥＭ３２内のプログラムによりアドレス情報が変更される場合もある。
【００３０】
本実施例のＤＳＰでは、上記のように３本のデータバス（Ｃ−ＢＵＳ１０，Ｄ−ＢＵＳ１２，Ｇ−ＢＵＳ１４）を設けており、これらのバス上で異なるアドレス情報またはデータを並列転送できるようになっている。
【００３１】
Ｃ−ＢＵＳ１０上では、上記のようにホストコンピュータより各メモリにダウンロードされるプログラム、アドレス情報、データの外に、Ｃ−ＭＥＭ１６よりＤ−ＭＥＭ１８またはＥＸ−Ｉ／Ｏ２２へ与えられるアドレス情報、Ｃ−ＭＥＭ１６よりＡＬＵ２６またはＭＡＣ２８に与えられる係数データ等が択一的に転送される。
【００３２】
Ｄ−ＢＵＳ１２上では、ＡＵ−Ｉ／Ｏ２４とＤ−ＭＥＭ１８との間でやりとりされる入出力オーディオデータ、Ｄ−ＭＥＭ１８とＥＸ−Ｉ／Ｏ２２との間でやりとりされる遅延オーディオデータ、Ｄ−ＭＥＭ１８とＡＬＵ２６またはＭＡＣ２８との間でやりとりされる演算データ等が択一的に転送される。
【００３３】
Ｇ−ＢＵＳ１４上では、Ｇ−ＭＥＭ２０とＥＸ−Ｉ／Ｏ２２との間でやりとりされるアドレス情報および遅延オーディオデータ、Ｇ−ＭＥＭ２０からＡＬＵ２６に与えられる演算データ等が転送される
【００３４】
このように、３本のデータバス（Ｃ−ＢＵＳ１０，Ｄ−ＢＵＳ１２，Ｇ−ＢＵＳ１４）上で異なるアドレス情報またはデータを同時に並列転送できるため、後述するように１つの命令実行サイクルで２つの命令を並列処理することが可能となっている。
【００３５】
図２は、本実施例のＤＳＰにおける命令ワードのフィールド配置図である。図２の（Ａ）は命令ワードの一般形式を示す。たとえば３２ビット長の１つの命令ワードの中で２つの命令（プライマリ命令、セカンダリ命令）を指定することが可能となっており、ビット［２９〜２２］のフィールドがプライマリ命令のオペコードに割り当てられ、ビット［２１〜１４］のフィールドがセカンダリ命令のオペコードに割り当てられている。ビット［３１，３０］は、２つの命令の組み合わせ形式（モード）を指定する。ビット［１３〜０］は、オペランドのアドレス指定に使われる。
【００３６】
１つの命令ワードでプライマリ命令またはセカンダリ命令のいずれかを指定することも可能である。図２の（Ｂ），（Ｃ）に、それぞれプライマリ命令、セカンダリ命令が単独で指定される場合のフィールド配置を示す。
【００３７】
本実施例のＤＳＰにおいては、Ｇ−ＢＵＳ１４を用いて外部メモリからのデー
タ（遅延オーディオデータ等）をＧ−ＭＥＭ２０に転送するためのバックグランド外部メモリ読出命令（ＢＲＤＥ）が設定可能となっている。１つの命令ワードの中にＢＲＤＥ命令が単独で指定されるときは、図２の（Ｄ）に示すような配置となり、セカンダリ命令のフィールドにＢＲＤＥ命令のオペコードが入る。ＢＲＤＥ命令におけるアドレスは、基底アドレスを基にＧ−ＭＥＭ２０のアドレス演算ユニット２１で生成されるため、オペランドは不要であり、アドレッシング・フィールドは空き領域となる。１つの命令ワードの中にＢＲＤＥ命令が他の命令（プライマリ命令）と一緒に指定されるときは、図２の（Ｄ）と（Ｂ）とが合成された配置になり、アドレッシング・フィールドはプライマリ命令のオペランドに供される。
【００３８】
次に、図３〜図７を参照して本実施例のＤＳＰにおける幾つかの命令の命令実行サイクルの動作について説明する。
【００３９】
図３は、ＭＡＣ２８を用いて所定の積和演算処理を行うＭＡＣ命令の１つである「ＭＡＣ△ＳＳ，Ｄ（ｘｘ），＊Ｃ０，Ｍ０」の命令実行サイクルを示す。この命令は、「アドレス（ｘｘ）で指定されるＤ−ＭＥＭ１８のメモリ番地の内容と（アドレッシングユニット１７内の）Ｃ０レジスタの内容（アドレス情報）で指定されるＣ−ＭＥＭ１６のメモリ番地の内容とを乗算し、その乗算結果を（ＭＡＣ２８内の）Ｍ０レジスタの内容とを加算し、その加算結果をＭ０レジスタに格納せよ」という意味の命令である。
【００４０】
この命令が実行されるときの動作は次のようになる。先ず、フェッチサイクル（Ｆｅｔｃｈ）で、制御装置３０のメモリ読出部がＰ−ＭＥＭ３２よりこの命令のワードを読み出す（▲１▼）。次に、デコードサイクル（Ｄｅｃｏｄｅ）で、制御装置３０のデコーダ部がこの命令を解読する（▲２▼）。この解読結果に基づき、制御装置３０のマイクロプログラム制御部が作動して、所要のレジスタ、ゲート類を働かせ、所要の各部にオペランド処理（Ｏｐｅｒａｎｄ）および実行処理（Ｅｘｃｕｔｉｏｎ）を行わせる。
【００４１】
オペランド処理サイクル（Ｏｐｅｒａｎｄ）では、制御装置３０よりアドレッシングユニット１７，１９を介してＣ−ＭＥＭ１６，Ｄ−ＭＥＭ１８にそれぞれアドレス情報が供給される。そして、Ｃ−ＭＥＭ１６，Ｄ−ＭＥＭ１８よりそれぞれ読み出されたデータはＣ−ＢＵＳ１０，Ｄ−ＢＵＳ１２を介してＭＡＣ２８に送られる（▲３▼）。実行処理サイクル（Ｅｘｃｕｔｉｏｎ）では、ＭＡＣ２８で乗算と加算が順次行われ、最終の演算結果がレジスタＭＯに格納される（▲４▼）。
【００４２】
なお、本実施例のＤＳＰはパイプライン方式で一連の命令を実行するため、相前後する命令の間では各命令実行サイクルが１フェーズだけずれている。たとえば、ある命令についてデコードサイクル（Ｄｅｃｏｄｅ）が行われている時は、これと同時に１つ前の命令についてのオペランド処理サイクル（Ｏｐｅｒａｎｄ）と、２つ前の命令についての実行処理サイクル（Ｅｘｃｕｔｉｏｎ）と、１つ後の命令についてのフェッチサイクル（Ｆｅｔｃｈ）が行われている。
【００４３】
図４は、外部メモリ読出命令「ＲＤＥ」の命令実行サイクルを示す。この命令は、「アドレス（ｃｍａ）で指定されるＣ−ＭＥＭ１６内のメモリ番地の内容を外部メモリをアクセスするためのアドレスとしてＥＸ−Ｉ／Ｏ２２のアドレスレジスタＥＸＡに格納せよ」という意味の命令である。
【００４４】
このＲＤＥ命令の実行サイクルにおいて、制御装置３０よりアドレス演算ユニット１７を介してＣ−ＭＥＭ１６にアドレス情報（ｃｍａ）が与えられた後に、Ｃ−ＭＥＭ１６より読み出された外部メモリアクセス用のアドレス情報がＣ−ＢＵＳ１０を介してＥＸ−Ｉ／Ｏ２２に与えられる。
【００４５】
このＲＤＥ命令が実行されると、ＥＸ−Ｉ／Ｏ２２がＣ−ＭＥＭ１６より受け取ったアドレス情報に基づいて外部メモリにアクセスし、そのアドレス情報で指定される外部メモリ内のメモリ番地の内容を読み出し、読み出したデータを読出用データレジスタＥＸＲに格納する。このようなＥＸ−Ｉ／Ｏ２２のメモリアクセス機能により、ＲＤＥ命令の実行後に所定数のマシンサイクルが経過すると、ＥＸ−Ｉ／Ｏ２２のデータレジスタＥＸＲに目的のデータが用意される。
【００４６】
図５は、上記したＲＤＥ命令と関連して用いられるデータ転送命令「ＭＯＶ△ＥＸＲ，ｄｍａ」の命令実行サイクルを示す。この命令は、「ＥＸ−Ｉ／Ｏ２２内のデータレジスタＥＸＲの内容をアドレス（ｄｍａ）で指定されるＤ−ＭＥＭ１８内のメモリ番地に格納せよ」という意味の命令である。上記したようにＲＤＥ命令の実行後に所定数のマシンサイクルが経過すると、ＥＸ−Ｉ／Ｏ２２のデータレジスタＥＸＲに目的のデータが用意されているので、次にこの命令「ＭＯＶ△ＥＸＲ，ｄｍａ」が実行されることで、その目的のデータをＤ−ＭＥＭ１８内に取り込むことができる。
【００４７】
なお、本実施例のＤＳＰでは、外部メモリ書込命令「ＷＲＥ」も定義されている。通常、この命令は「ＷＲＥｃｍａ，ｄｍａ」と規定される。これは、「アドレス（ｃｍａ）で指定されるＣ−ＭＥＭ１０内のメモリ番地の内容（アドレス）で指定される外部メモリのメモリ番地にアドレス（ｄｍａ）で指定されるＤ−ＭＥＭ１２内のメモリ番地の内容（データ）を書き込め」という意味である。
【００４８】
このＷＲＥ命令が実行されるときも、アドレス情報に基づいて読み出された各データは、それぞれＣ−ＢＵＳ１０およびＤ−ＢＵＳ１２を介してＥＸ−Ｉ／Ｏ２２内のアドレスレジスタＥＸＡおよび書込用データレジスタＥＸＷに転送される。
【００４９】
上記したように、ＣＤ等の再生回路からのオーディオデータは一定時間毎に入力され、ＡＵ−Ｉ／Ｏ２４より割込み処理でＤ−ＭＥＭ１８に格納される。ＦＩＦＯ形式で、Ｄ−ＭＥＭ１８には入力オーディオデータが格納される一方で、時間的に古い順に遅延データは吐き出される。しかし、音場再生や残響再生等のように遅延データを大量に使うディジタル信号処理では、数秒前までの遅延データを使う場合もあるため、Ｄ−ＭＥＭ１８から吐き出される遅延データを上記ＷＲＥ命令によって外部メモリに蓄積するようにしている。そして、フィルタ演算で遅延データが必要になった時に、後述するバックグランド外部メモリ読出命令（ＢＲＤＥ）によって外部メモリから読み出すようにしている。
【００５０】
図６は、本実施例によるバックグランド外部メモリ読出命令「ＢＲＤＥ」の命令実行サイクルを示す。このＢＲＤＥ命令は、「（アドレッシングユニット２１内の）ＧＢレジスタの内容で指定されるＧ−ＭＥＭ２０内のメモリ番地の内容を外部メモリアクセス用のアドレス情報としてＥＸ−Ｉ／Ｏ２２内のアドレスレジスタＥＸＡに転送し、該ＧＢレジスタの内容に８０Ｈ（１０００００００）を加算した値で指定されるＧ−ＭＥＭ２０内のメモリ番地にＥＸ−Ｉ／Ｏ２２の内の読出用データレジスタＥＸＲの内容を格納せよ」という意味の命令である。
【００５１】
このＢＲＤＥ命令の命令実行サイクルでは、オペランド処理サイクル（Ｏｐｅｒａｎｄ）の間にＧＢレジスタの内容に基づいたＧ−ＭＥＭ２０のメモリ番地の内容である外部メモリアクセス用のアドレス情報がＧ−ＢＵＳ１４を介してＥＸ−Ｉ／Ｏ２２へ転送され、実行処理サイクル（Ｅｘｃｕｔｉｏｎ）の間にＥＸ−Ｉ／Ｏ２２からのデータがＧ−ＢＵＳ１４を介してＧ−ＭＥＭ２０へ転送されるとともにアドレッシングユニット２１内でＧＢレジスタの内容が１つインクリメントされる。
【００５２】
このＢＲＤＥ命令でＥＸ−Ｉ／Ｏ２２内の読出用データレジスタＥＸＲから転送されるデータは、前回のＢＲＤＥ命令に応動してＥＸ−Ｉ／Ｏ２２が外部メモリから読み出したデータである。つまり、前回のＢＲＤＥ命令でＧ−ＭＥＭ２０からＥＸ−Ｉ／Ｏ２２内のアドレスレジスタＥＸＡに転送されたアドレス情報に対応したデータである。今回のＢＲＤＥ命令でアドレスレジスタＥＸＡに格納されたアドレス情報に対応するデータは、所定のマシンサイクルで外部メモリより読み出されて読出用データレジスタＥＸＲに保持され、次のＢＲＤＥ命令でＧ−ＭＥＭ２０へ転送されることになる。
【００５３】
なお、ＢＲＤＥ命令でＧ−ＭＥＭ２０に格納されるデータのアドレス（メモリ番地）は、ＧＢレジスタの内容（外部メモリアクセス用のアドレス情報）に８０Ｈ（１０００００００）を加算することで、つまりアドレス情報の最上位ビットの０を１にするだけで、求められるようになっている。したがって、アドレス計算が簡単であり、そのぶんアドレッシングユニット２１内の構成が簡易化されている。
【００５４】
このように、ＢＲＤＥ命令では、専用のＧ−ＭＥＭ２０を使用してＧ−ＭＥＭ２０とＥＸ−Ｉ／Ｏ２２との間でアドレス情報およびデータの転送が行われる。他のバス（Ｃ−ＢＵＳ１０，Ｄ−ＢＵＳ１２）は使用されず、他のメモリ（Ｃ−ＭＥＭ１６，Ｄ−ＭＥＭ１８）は関与しない。したがって、１つの命令実行サイクルの間にこのＢＲＤＥ命令と同時にＣ−ＢＵＳ１０，Ｄ−ＢＵＳ１２を使用する演算命令を並列処理（実行）することが可能である。
【００５５】
なお、外部メモリよりＧ−ＭＥＭ２０に取り込まれたデータ（遅延オーディオデータ等）は、フィルタ演算等でそれが必要となった時点で、Ｇ−ＭＥＭ２０より読み出されてＭＡＣ２８またはＡＬＵ２６へ転送される。そのための命令も定義されているが、これはＣ−ＢＵＳ１０，Ｄ−ＢＵＳ１２を介したＣ−ＭＥＭ１６，Ｄ−ＭＥＭ１８の命令と同様であるので、その説明は省略する。
【００５６】
図７は、ＢＲＤＥ命令を含む並列処理型命令の１つである「ＭＡＣ△ＳＳ，Ｄ（ｘｘ），＊Ｃ０，Ｍ０／ＢＲＤＥ」命令の命令実行サイクルを示す。この並列処理型命令は、図３のＭＡＣ命令と図６のＢＲＤＥ命令とを並列的に重ね合わせたものである。命令ワードは図２の（Ａ）の形式であり、プライマリ命令のフィールドにＭＡＣ命令のオペコードが規定され、セカンダリ命令のフィールドにＢＲＤＥ命令のオペコードが規定される。
【００５７】
この命令が実行されるときの動作は次のようになる。先ず、フェッチサイクル（Ｆｅｔｃｈ）で、制御装置３０のメモリ読出部がＰ−ＭＥＭ３２よりこの並列処理型命令のワードを読み出す（▲１▼）。次に、デコードサイクル（Ｄｅｃｏｄｅ）で、制御装置３０のデコーダ部がこの並列処理型命令に含まれているＭＡＣ命令とＢＲＤＥ命令とを並列または同時に解読する（▲２▼）。
【００５８】
この場合、制御装置３０のＰＬＡ制御部からの制御信号は、ＭＡＣ命令とＢＲＤＥ命令について並列的つまりＯＲ形式で出力される。これにより、オペランド処理サイクル（Ｏｐｅｒａｎｄ）では、アドレッシングユニット１７，１９からのアドレス情報がＣ−ＭＥＭ１６，Ｄ−ＭＥＭ１８にそれぞれ供給され、Ｃ−ＭＥＭ１６，Ｄ−ＭＥＭ１８からそれぞれ目的のメモリ番地の内容（データ）が読み出される（ＭＡＣ命令のオペランド処理）と同時に、Ｇ−ＭＥＭ２０から読み出された外部メモリアクセス用のアドレス情報がＧ−ＢＵＳ１４を介してＥＸ−Ｉ／Ｏ２２へ転送される（ＢＲＤＥ命令の転送処理）。
【００５９】
そして、実行処理サイクル（Ｅｘｃｕｔｉｏｎ）では、ＭＡＣ２８で積和演算が行われる（ＭＡＣ命令の実行処理）と同時に、ＥＸ−Ｉ／Ｏ２２からのデータがＧ−ＢＵＳ１４を介してＧ−ＭＥＭ２０へ転送されるとともにアドレッシングユニット２１内でＧＢレジスタの内容が１つインクリメントされる（ＢＲＤＥ命令の実行処理）。
【００６０】
通常のオーディオ・ディジタル信号処理のサンプリング周波数は４４．１ＫＨｚであり、約２２μｓｅｃの時間間隔でＣＤ等の外部回路よりディジタル・オーディオ信号が入ってくる。この時間（約２２μｓｅｃ）内にどれだけ多くの積和演算を実行できるかでＤＳＰの性能が決まる。この時間（約２２μｓｅｃ）内でパイプライン処理できる命令実行サイクルの数は決まっており、たとえば５１２ステップに設定されているので、そのうちどれだけ多くのステップを演算処理に充てられるかでＤＳＰの性能が決まるともいえる。一方、音場再生等のようにフィルタ演算で大量の遅延データを用いるときは、外部メモリに蓄積している遅延データを頻繁に読み出さなければならない。
【００６１】
本実施例のＤＳＰでは、上記のように、並列型命令体系の下で演算処理（特に積和演算）命令とは独立にＢＲＤＥ命令によりＧ−ＭＥＭ２０およびＧ−ＢＵＳ１４を用いて外部メモリより遅延データを読み出すことができる。したがって、上記一定時間内に外部メモリからデータを読み出すＢＲＤＥ命令を実行しつつ、可及的に多くのステップを演算処理に充てることが可能であり、ＤＳＰの処理能力を十二分に引き出すことができる。
【００６２】
上記した実施例では、係数データを記憶するＣ−ＭＥＭ１６およびオーディオデータを記憶するＤ−ＭＥＭ１８とは別に汎用のＧ−ＭＥＭ２０を設け、このＧ−ＭＥＭ２０に外部メモリからの遅延データを読み込むようにした。しかし、Ｃ−ＭＥＭ１６および／またはＤ−ＭＥＭ１８を２つのポートを有するデュアルポート型のメモリで構成してＧ−ＢＵＳ１４に接続することで、Ｇ−ＭＥＭ２０を設けなくても上記ＢＲＤＥ命令を実行することが可能である。
【００６３】
また、上記実施例のＤＳＰはオーディオ・ディジタル信号処理に係るものであったが、本発明によるＤＳＰは任意のディジタル信号処理に適用可能である。
【００６４】
なお、上記実施例では、外部メモリにはオーディオデータが格納されているものとして説明したが、この外部メモリにはオーディオデータのほか、係数データなどの別種のデータが格納されることもある。
【発明の効果】
以上説明したように、本発明のディジタル信号処理装置によれば、所定の１つの命令実行サイクルの中で、第１および第２のバスを用いて演算命令を実行すると同時に、第３のバスを用いて外部メモリからのデータを内部メモリに取り込めるようにしたので、パイプライン処理の高速性を確保すると同時に単位時間当たりの演算処理回数を可及的に多くし、処理能力を向上させることができる。
【図面の簡単な説明】
【図１】本発明の一実施例によるオーディオ・ディジタル信号処理用ＤＳＰの構成を示すブロック図である。
【図２】実施例のＤＳＰにおける命令ワードのフィールド配置図である。
【図３】実施例における代表的な積和演算命令の命令実行サイクルを示す図である。
【図４】実施例における外部メモリ読出命令の命令実行サイクルを示す図である。
【図５】実施例において外部メモリ読出命令と関連するデータ転送命令の命令実行サイクルを示す図である。
【図６】実施例におけるバックグランド外部メモリ読出命令「ＢＲＤＥ」の命令実行サイクルを示す図である。
【図７】実施例においてＢＲＤＥ命令を含む代表的な並列処理型命令の命令実行サイクルを示す図である。
【図８】従来の典型的なＤＳＰシステムの要部の構成を示すプロック図である。
【図９】従来の別のＤＳＰシステムの要部の構成を示すプロック図である。
【符号の説明】
１０Ｃ−ＢＵＳ（データバス）
１２Ｄ−ＢＵＳ（データバス）
１４Ｇ−ＢＵＳ（データバス）
１６Ｃ−ＭＥＭ（係数メモリ）
１８Ｄ−ＭＥＭ（データメモリ）
２０Ｇ−ＭＥＭ（汎用メモリ）
１７，１９，２１アドレッシングユニット
２２ＥＸ−Ｉ／Ｏ（外部メモリ入出力インタフェース回路）
２４ＡＵ−Ｉ／Ｏ（オーディオ・インタフェース回路）
２６ＡＬＵ（算術論理演算ユニット）
２８ＭＡＣ（積和演算器）
３０制御装置
３２Ｐ−ＭＥＭ（プログラムメモリ）
３４ＨＯＳＴ−Ｉ／Ｏ（ホストインタフェース回路）[0001]
[Industrial applications]
The present invention relates to a pipeline type digital signal processing device.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a DSP (Digital Signal Processor) has been used for digital signal processing that handles a large number of product-sum operations such as a digital filter, digital automatic equalization, and fast Fourier transform (FFT). Generally, a DSP incorporates a high-speed multiplier, an adder, a program memory, a data memory, and the like in order to realize a high-speed product-sum operation, and performs a microprogram control or a PLA control type microprocessor capable of performing pipeline processing. It is configured as The DSP also has an input / output function, and when there is much data to be stored, stores the data in an external auxiliary memory via the input / output interface.
[0003]
In order to enable the DSP to access the external memory as needed, a system configuration as shown in FIG. 8 is generally employed. In this DSP system, the arithmetic control unit 102 in the DSP 100 directly supplies address information to the external memory 104, and at the time of writing, write data from the arithmetic control unit 102 is transmitted to the external memory 104 via the data bus 106 and the input / output port 108. At the time of reading, read data from the external memory 104 is sent to the arithmetic and control unit 102 via the input / output port 108 and the data bus 106.
[0004]
But this methodIsSince each unit in the operation control unit 102 cannot move to the next step until the data transfer is completed, the pipeline processing is in a hold state during that time.DisadvantageThere is. In fact, the time required for accessing the external memory 104 is generally longer than when the arithmetic control unit 102 accesses the internal memory 110.
[0005]
For example, in audio / digital signal processing such as sound field reproduction and sound field compensation, audio data from a CD (Compact Disc) has a data length of 16 bits. If such audio data is input / output to / from the external memory 104 while maintaining the 16-bit data length, the number of input / output pins and the number of external memory devices increase, and hardware costs are reduced. It will be quite high. This is particularly noticeable when a large amount of delay data is handled, such as when processing reflected sounds. Therefore, by reducing the number of input / output bits to / from the external memory 104 and increasing the number of input / output operations, the overall hardware cost is reduced. However, as a cost, the access time to the external memory 104 becomes longer, and the time during which the pipeline is held becomes longer, which causes a problem that the processing efficiency of the DSP becomes lower.
[0006]
FIG. 9 shows a configuration of a DSP system conventionally adopted as a method for solving the above-mentioned problem. In this method, an external memory controller 112 capable of writing and reading data to and from the external memory 104 is provided in the DSP 100 '. The external memory controller 112 has an address register 112a for temporarily storing address information and a data register 112b for temporarily storing data.
[0007]
When the arithmetic control unit 102 writes data to the external memory 104, the arithmetic control unit 102 only needs to transfer address information and data to the external memory controller 112 via the data bus 106. You can move on to the next step. On the other hand, the external memory controller 112 holds the address information and the data from the arithmetic control unit 124 in the address register 112a and the data register 112b, respectively, accesses the external memory 112, and stores the data in the memory address designated by the address information. Write. This writing is performed in a predetermined cycle.
[0008]
When the arithmetic control unit 102 reads data from the external memory 104, the arithmetic control unit 102 transfers the address information to the external memory controller 112 via the data bus 106. When the external memory controller 112 reads data from the external memory 104, a predetermined cycle is required. In the external memory controller 112, when the address information from the arithmetic control unit 102 is loaded into the address register 112a, the data read from the external memory 104 in the previous memory cycle is stored in the data register 112b. Therefore, the arithmetic control unit 102 can send the current address information to the external memory controller 112 and receive the previous data from the external memory controller 112 at the same time, and each unit of the arithmetic control unit 102 can immediately proceed to the next step.
[0009]
As described above, since the external memory controller 112 writes and reads data to and from the external memory 104, the arithmetic control unit 102 transfers address information and data to and from the external memory controller 112 via the data bus 106. That is, there is no need to hold the pipeline processing.
[0010]
[Problems to be solved by the invention]
The DSP system shown in FIG. 9 assures high-speed arithmetic processing as long as the entire pipeline processing need not be held when accessing the external memory 104. However, while the arithmetic control unit 102 accesses the external memory controller 112, the data bus 106 is in use, so that arithmetic processing cannot be performed in the arithmetic control unit 102. That is, by accessing the external memory 104 once, the number of calculation processes in the calculation control unit 102 is reduced by one. In audio / digital signal processing such as sound field reproduction as described above, the DSP performance is determined by how many multiply-accumulate operations can be performed within a fixed time defined by the sampling frequency of the audio signal. In this conventional system, the execution of an instruction to read data from the external memory 104 reduces the number of arithmetic processing operations in the arithmetic control unit 102, so that there is a problem that the performance of the DSP cannot be sufficiently brought out.
[0011]
The present invention has been made in view of such a problem, and enables data to be read from an external memory without deteriorating the processing efficiency, thereby ensuring high-speed pipeline processing and simultaneously reducing the number of processing operations per unit time. It is an object of the present invention to provide a digital signal processing device having as many as possible and having improved processing capability.
[0012]
[Means for Solving the Problems]
In order to achieve the above object, a digital signal processing device according to the present invention is arranged so that different data can be transferred simultaneously in a digital signal processing device that executes a series of instructions in a pipeline system to process a digital signal. First, second and third buses, a first internal memory connected to the first bus, a second internal memory connected to the second bus, and the first and second buses. Computing means connected to the third bus, a third internal memory connected to the third bus, and connected to at least one of the first and second buses, and a third memory connected to the third bus And input / output interface means connected to at least one of the first and second buses and capable of writing and reading data to and from an external memory. During the decree execution cycle, construction first instructions for using said first and second bus, and a second instruction using said third bus to perform parallelThen, in the predetermined one instruction execution cycle, for the first instruction, data is read from the first and second internal memories, respectively, and the read data is stored in the first and second memories. The data is transferred to the arithmetic means via a second bus, and then a predetermined operation is performed on both data by the arithmetic means. For the second instruction, predetermined address information is transmitted via the third bus. The data sent to the input / output interface means and then read from the external memory to the input / output interface means in advance is transferred to the third internal memory via the third bus. ing.
[0013]
In the digital signal processing device of the present invention, preferably,The third internal memory has a first memory area for storing the address information for accessing the external memory and a second memory area for storing data transferred from the external memory. The set address information and the corresponding data may each have a fixed offset between a memory address in the first memory area and a memory address in the second memory area where each is stored. .
[0014]
Further, in the digital signal processing device of the present invention, at least one of the first internal memory and the second internal memory may be a dual-port type memory.
[0016]
[Action]
In the digital signal processing device of the present invention, a third bus for data transfer connected to the input / output interface means is provided in addition to the first and second buses mainly used for arithmetic processing. As a result, in the instruction execution cycle of one parallel instruction, an operation instruction is executed using the first and second buses, and at the same time, data from the external memory is transferred to the internal memory using the third bus. Can be captured.
[0017]
【Example】
Hereinafter, an embodiment of the present invention will be described with reference to FIGS.
[0018]
FIG. 1 shows a system configuration of an audio / digital signal processing DSP according to an embodiment of the present invention. This DSP system has three independent data buses (C-BUS10, D-BUS12, G-BUS14), and each part is connected to these buses as shown in the figure.
[0019]
The C-BUS 10 has a coefficient memory (C-MEM) 16, a general-purpose memory (G-RAM) 20, an arithmetic and logic unit (ALU) 26, a product-sum operation unit (MAC) 28, and a program memory (P -MEM) 32 and a host interface circuit (HOST-I / O) 34 are connected.
[0020]
The D-BUS 12 includes a data memory (D-MEM) 18, a general-purpose memory (G-RAM) 20, an external memory input / output interface circuit (EX-I / O) 22, and an audio interface circuit (AU- An I / O) 24, an arithmetic and logic unit (ALU) 26, and a product-sum operation unit (MAC) 28And a host interface circuit (HOST-I / O) 34And are connected.
[0021]
The G-BUS 14 is connected to a general-purpose memory (G-MEM) 20, an external memory input / output interface circuit (EX-I / O) 22, and an arithmetic and logic unit (ALU) 26.
[0022]
Each of the C-MEM 16, the D-MEM 18, and the G-MEM 20 is composed of a RAM (Random Access Memory). The C-MEM 16 mainly stores coefficient data for the product-sum operation, and also stores address information for accessing an external memory (not shown) connected to the EX-I / O 22. The D-MEM 18 stores data (mainly audio data) used for the product-sum operation and other operations and data of the operation result.
[0023]
The G-MEM 20 is normally used as an extension memory of the D-MEM 18. When handling a large amount of delay data such as sound field reproduction, delay data that cannot be accommodated in the D-MEM 18 is stored in an external memory such as a RAM, and when necessary, a predetermined instruction (BRDE instruction) described later is used to store the external data. The delay data is taken into the G-MEM 20 from the memory. In this case, the G-MEM 20 also stores address information for accessing an external memory. The G-MEM 20 can also be used as an extension memory of the C-MEM 16, and may store coefficient data as needed.
[0024]
The C-MEM 16, the D-MEM 18, and the G-MEM 20 have addressing units 17, 19, and 21 for performing address calculation, respectively.
[0025]
The EX-I / O 22 is also connected to the external memory for storing the delayed data, has a memory control function of accessing the external memory and writing or reading data, and holds the address information of the memory access. And a data register for holding write or read data.
[0026]
The AU-I / O 24 is provided between the DSP and an external digital audio circuit.
It is an interface circuit for exchanging data, and is connected to, for example, a preceding stage CD reproducing circuit, a next stage digital filter or a D / A converter. When an audio signal (data) is input from an external circuit, if one data is stored in a register in the AU-I / O 24, a control device described later is used.30Is interrupted, and the data is stored in the D-MEM 18 via the D-BUS 12 in the interrupt processing.
[0027]
The ALU 26 is an arithmetic unit that performs an arbitrary arithmetic operation and a logical operation, and includes an accumulator. The MAC 28 is an arithmetic unit that exclusively performs a product-sum operation, and includes a multiplier and an accumulator.ToBuilt-in. Since the two arithmetic units (ALU 26 and MAC 28) are provided in this manner, parallel processing such as performing convolution with MAC 28 while performing addition with ALU 26 is possible.
[0028]
The P-MEM 32 is composed of a RAM (Random Access Memory) and stores a program that defines the processing operation of the DSP. controlapparatusReference numeral 30 sequentially reads instructions from the P-MEM 32, controls registers and gates (not shown) in the system by a PLA (Program Logic Array) control method, and functions so that each unit executes the instructions. In FIG. 1, the control bus is not shown for convenience of explanation.
[0029]
The HOST-I / O 34 is an interface circuit for exchanging programs and data between the DSP and a host controller (not shown). The HOST-I / O 34 is connected to the C-BUS 10 via a parallel port, and is connected to the host controller via a serial port. Connected by port. The program stored in the P-MEM 32, the coefficient data and the address information stored in the C-MEM 16, and the address information stored in the G-MEM 20 are given from the host controller, and are sent from the HOST-I / O 34 via the C-BUS 10. Downloaded to each memory. The address information may be changed by a program in the P-MEM 32.
[0030]
In the DSP of this embodiment, three data buses (C-BUS 10, D-BUS 12, and G-BUS 14) are provided as described above, and different address information or data can be transferred in parallel on these buses. Has become.
[0031]
On the C-BUS 10, in addition to the program, address information, and data downloaded from the host computer to each memory as described above, address information given to the D-MEM 18 or the EX-I / O 22 from the C-MEM 16; The coefficient data or the like given to the ALU 26 or the MAC 28 from the MEM 16 is alternatively transferred.
[0032]
On the D-BUS 12, input / output audio data exchanged between the AU-I / O 24 and the D-MEM 18, delayed audio data exchanged between the D-MEM 18 and the EX-I / O 22, D-MEM 18 Operation data and the like exchanged with the ALU 26 or the MAC 28 are alternatively transferred.
[0033]
On the G-BUS 14, address information and delayed audio data exchanged between the G-MEM 20 and the EX-I / O 22,20Transfer operation data and the like given to ALU 26
[0034]
As described above, different address information or data can be simultaneously transferred in parallel on the three data buses (C-BUS10, D-BUS12, G-BUS14), so that two instructions are executed in one instruction execution cycle as described later. It is possible to perform parallel processing.
[0035]
FIG. 2 is a field arrangement diagram of an instruction word in the DSP of the present embodiment. FIG. 2A shows the general format of the instruction word. For example, two instructions (a primary instruction and a secondary instruction) can be designated in one instruction word having a length of 32 bits. A field of bits [29 to 22] is assigned to an operation code of the primary instruction. The field of bits [21 to 14] is assigned to the operation code of the secondary instruction. Bits [31, 30] specify the combination format (mode) of the two instructions. Bits [13-0] are used to address the operand.
[0036]
It is also possible to specify either a primary instruction or a secondary instruction with one instruction word. FIGS. 2B and 2C show the field arrangement when the primary instruction and the secondary instruction are individually specified.
[0037]
In the DSP of this embodiment, data from an external memory is
A background external memory read command (BRDE) for transferring data (delayed audio data and the like) to the G-MEM 20 can be set. When the BRDE instruction is solely specified in one instruction word, the arrangement is as shown in FIG. 2D, and the operation code of the BRDE instruction is entered in the field of the secondary instruction. The address in the BRDE instruction isbaseSince the address is generated by the address operation unit 21 of the G-MEM 20 based on the address, no operand is required, and the addressing field becomes an empty area. When the BRDE instruction is specified together with another instruction (primary instruction) in one instruction word, the (D) of FIG.(B)And the addressing field is provided for the operand of the primary instruction.
[0038]
Next, the operation of the DSP of this embodiment in the instruction execution cycle of some instructions will be described with reference to FIGS.
[0039]
FIG. 3 shows an instruction execution cycle of “MAC @ SS, D (xx), * C0, M0” which is one of the MAC instructions for performing a predetermined product-sum operation using the MAC. This instruction is composed of the contents of the memory address of the D-MEM 18 specified by the address (xx) and the contents of the memory address of the C-MEM 16 specified by the contents (address information) of the C0 register (in the addressing unit 17). , Add the result of the multiplication to the contents of the M0 register (in the MAC 28), and store the addition result in the M0 register. "
[0040]
The operation when this instruction is executed is as follows. First, in the fetch cycle (Fetch), the memory read unit of the control device 30 reads the word of this instruction from the P-MEM 32 ((1)). Next, in a decode cycle (Decode), the decoder section of the control device 30 decodes this instruction ((2)). Based on the result of the decoding, the microprogram control unit of the control device 30 is operated to activate required registers and gates, and to cause each required unit to perform operand processing (operand) and execution processing (execution).
[0041]
In the operand processing cycle (Operand), address information is supplied from the control device 30 to the C-MEM 16 and the D-MEM 18 via the addressing units 17 and 19, respectively. The data read from the C-MEM 16 and the D-MEM 18 are sent to the MAC 28 via the C-BUS 10 and the D-BUS 12 ((3)). In the execution processing cycle (Execution), multiplication and addition are sequentially performed by the MAC 28, and the final operation result is stored in the register MO ([4]).
[0042]
Since the DSP of this embodiment executes a series of instructions in a pipeline system, each instruction execution cycle is shifted by one phase between successive instructions. For example, when a decode cycle (Decode) is being performed for a certain instruction, at the same time, an operand processing cycle (Operand) for the immediately preceding instruction and an execution processing cycle (Extraction) for the second previous instruction are performed. A fetch cycle (Fetch) is performed for the next instruction.
[0043]
FIG. 4 shows an instruction execution cycle of the external memory read instruction “RDE”. This instruction is an instruction that means “store the contents of the memory address in the C-MEM 16 specified by the address (cma) in the address register EXA of the EX-I / O 22 as an address for accessing the external memory”. is there.
[0044]
In the execution cycle of the RDE instruction, after the address information (cma) is given from the control device 30 to the C-MEM 16 via the address operation unit 17, the address information for external memory access read from the C-MEM 16 is read. It is provided to the EX-I / O 22 via the C-BUS 10.
[0045]
When the RDE instruction is executed, the EX-I / O 22 accesses the external memory based on the address information received from the C-MEM 16, reads out the contents of the memory address in the external memory specified by the address information, The read data is stored in the read data register EXR. With the memory access function of the EX-I / O 22, the target data is prepared in the data register EXR of the EX-I / O 22 when a predetermined number of machine cycles have elapsed after the execution of the RDE instruction.
[0046]
FIG. 5 shows an instruction execution cycle of a data transfer instruction “MOV @ EXR, dma” used in connection with the above-mentioned RDE instruction. This instruction is an instruction that means “store the contents of the data register EXR in the EX-I / O 22 in a memory address in the D-MEM 18 specified by the address (dma)”. As described above, when a predetermined number of machine cycles have elapsed after the execution of the RDE instruction, the target data is prepared in the data register EXR of the EX-I / O 22. Next, this instruction "MOV @ EXR, dma" By being executed, the target data can be taken into the D-MEM 18.
[0047]
In the DSP of this embodiment, an external memory write instruction “WRE” is also defined. Usually, this instruction is defined as “WRE cma, dma”. This is because “the memory address in the D-MEM 12 specified by the address (dma) is set to the memory address of the external memory specified by the content (address) of the memory address in the C-MEM 10 specified by the address (cma). Write content (data). "
[0048]
When the WRE instruction is executed, each data read based on the address information is transferred to the address register EXA and the write data register in the EX-I / O 22 via the C-BUS 10 and the D-BUS 12, respectively. Transferred to EXW.
[0049]
As described above, audio data from a reproduction circuit such as a CD is input at regular intervals, and is stored in the D-MEM 18 by the AU-I / O 24 by interruption processing. In the FIFO format, while the input audio data is stored in the D-MEM 18, the delay data is ejected in chronological order. However, in digital signal processing that uses a large amount of delay data such as sound field reproduction or reverberation reproduction, the delay data up to several seconds before may be used. They are stored in memory. Then, when the delay data becomes necessary in the filter operation, it is read from the external memory by a background external memory read command (BRDE) described later.
[0050]
FIG. 6 shows an instruction execution cycle of the background external memory read instruction “BRDE” according to the present embodiment. The BRDE instruction reads the contents of the memory address in the G-MEM 20 specified by the contents of the GB register (in the addressing unit 21) as address information for external memory access in the address register EXA in the EX-I / O 22. Transfer, and store the contents of the read data register EXR in the EX-I / O 22 at the memory address in the G-MEM 20 specified by the value obtained by adding 80H (10000000) to the contents of the GB register. " It is an instruction.
[0051]
In the instruction execution cycle of this BRDE instruction, address information for external memory access, which is the content of the memory address of the G-MEM 20 based on the content of the GB register during the operand processing cycle (Operand), is transmitted via the G-BUS 14 to the EX. The data from the EX-I / O 22 is transferred to the G-MEM 20 via the G-BUS 14 during the execution processing cycle (Execution), and the content of the GB register is stored in the addressing unit 21. It is incremented by one.
[0052]
The data transferred from the read data register EXR in the EX-I / O 22 by the BRDE instruction is data read from the external memory by the EX-I / O 22 in response to the previous BRDE instruction. That is, the data corresponds to the address information transferred from the G-MEM 20 to the address register EXA in the EX-I / O 22 by the previous BRDE instruction. The data corresponding to the address information stored in the address register EXA by the current BRDE instruction is read out from the external memory in a predetermined machine cycle and held in the read data register EXR, and is sent to the G-MEM 20 by the next BRDE instruction. Will be transferred.
[0053]
Note that the address (memory address) of the data stored in the G-MEM 20 by the BRDE instruction is obtained by adding 80H (10000000) to the contents of the GB register (address information for external memory access), that is, the most significant address information. Just by setting the upper bit 0 to 1, it can be obtained. Therefore, the address calculation is simple, and the configuration in the addressing unit 21 is simplified accordingly.
[0054]
Thus, in the BRDE instruction, address information and data are transferred between the G-MEM 20 and the EX-I / O 22 using the dedicated G-MEM 20. Other buses (C-BUS10, D-BUS12) are not used, and other memories (C-MEM16, D-MEM18) are not involved. Therefore, during one instruction execution cycle, it is possible to perform parallel processing (execution) of an arithmetic instruction using C-BUS10 and D-BUS12 simultaneously with this BRDE instruction.
[0055]
The data (delayed audio data, etc.) taken into the G-MEM 20 from the external memory is read out from the G-MEM 20 and transferred to the MAC 28 or the ALU 26 when it becomes necessary by a filter operation or the like. The instruction for that is also defined, but this is the same as the instruction of the C-MEM 16 and the D-MEM 18 via the C-BUS 10 and the D-BUS 12, and the description thereof is omitted.
[0056]
FIG. 7 shows an instruction execution cycle of a "MAC @ SS, D (xx), * C0, M0 / BRDE" instruction, which is one of the parallel processing instructions including the BRDE instruction. This parallel processing type instruction is obtained by superposing the MAC instruction of FIG. 3 and the BRDE instruction of FIG. 6 in parallel. The instruction word has the format shown in FIG. 2A, and the operation code of the MAC instruction is defined in the field of the primary instruction, and the operation code of the BRDE instruction is defined in the field of the secondary instruction.
[0057]
The operation when this instruction is executed is as follows. First, in a fetch cycle (Fetch), the memory read unit of the control device 30 reads the word of the parallel processing type instruction from the P-MEM 32 ((1)). Next, in a decode cycle (Decode), the decoder unit of the control device 30 decodes the MAC instruction and the BRDE instruction included in the parallel processing type instruction in parallel or simultaneously ([2]).
[0058]
In this case, the control signal from the PLA control unit of the control device 30 is output in parallel with the MAC instruction and the BRDE instruction, that is, in an OR format. Thus, in the operand processing cycle (Operand), the address information from the addressing units 17 and 19 is supplied to the C-MEM 16 and the D-MEM 18, respectively, and the contents (data) of the target memory addresses are respectively supplied from the C-MEM 16 and the D-MEM 18. ) Is read (operand processing of the MAC instruction), and at the same time, the external memory access address information read from the G-MEM 20 is transferred to the EX-I / O 22 via the G-BUS 14 (BRDE instruction transfer). processing).
[0059]
Then, in the execution processing cycle (Execution), at the same time that the MAC 28 performs a product-sum operation (MAC instruction execution processing), data from the EX-I / O 22 is transferred to the G-MEM 20 via the G-BUS 14. At the same time, the content of the GB register is incremented by one in the addressing unit 21 (execution processing of the BRDE instruction).
[0060]
The sampling frequency of normal audio / digital signal processing is 44.1 KHz, and a digital audio signal is input from an external circuit such as a CD at a time interval of about 22 μsec. The DSP performance is determined by how many product-sum operations can be executed within this time (about 22 μsec). The number of instruction execution cycles that can be pipelined within this time (approximately 22 μsec) is determined, and is set to, for example, 512 steps. Therefore, the DSP performance depends on how many of the steps can be used for arithmetic processing. It can be said that it is decided. On the other hand, when a large amount of delay data is used in a filter operation as in sound field reproduction or the like, the delay data stored in the external memory must be frequently read.
[0061]
In the DSP according to the present embodiment, as described above, the G-MEM 20 and the G-BUS 14 are used by the BRDE instruction independently of the arithmetic processing (especially, the product-sum operation) instruction under the parallel instruction system to delay data from the external memory. Can be read. Therefore, it is possible to allocate as many steps as possible to the arithmetic processing while executing the BRDE instruction for reading data from the external memory within the above-mentioned fixed time, and it is possible to fully exploit the processing capability of the DSP. it can.
[0062]
In the above embodiment, a general-purpose G-MEM 20 is provided separately from the C-MEM 16 for storing coefficient data and the D-MEM 18 for storing audio data, and the delay data from the external memory is read into the G-MEM 20. . However, by configuring the C-MEM 16 and / or the D-MEM 18 as a dual-port memory having two ports and connecting to the G-BUS 14, the BRDE instruction can be executed without providing the G-MEM 20. Is possible.
[0063]
Although the DSP of the above embodiment relates to audio / digital signal processing, the DSP according to the present invention is applicable to any digital signal processing.
[0064]
In the above embodiment, the external memory stores audio data. However, the external memory may store other types of data such as coefficient data in addition to audio data.
【The invention's effect】
As described above, according to the digital signal processing device of the present invention, the arithmetic instruction is executed using the first and second buses in one predetermined instruction execution cycle, and the third bus is simultaneously executed. Data from the external memory into the internal memory, thereby ensuring high-speed pipeline processing, increasing the number of arithmetic operations per unit time as much as possible, and improving processing performance. .
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an audio / digital signal processing DSP according to an embodiment of the present invention.
FIG. 2 is a diagram showing a field arrangement of an instruction word in the DSP of the embodiment.
FIG. 3 is a diagram showing an instruction execution cycle of a representative product-sum operation instruction in the embodiment.
FIG. 4 is a diagram showing an instruction execution cycle of an external memory read instruction in the embodiment.
FIG. 5 is a diagram showing an instruction execution cycle of a data transfer instruction related to an external memory read instruction in the embodiment.
FIG. 6 is a diagram showing an instruction execution cycle of a background external memory read instruction “BRDE” in the embodiment.
FIG. 7 is a diagram showing an instruction execution cycle of a representative parallel processing type instruction including a BRDE instruction in the embodiment.
FIG. 8 is a block diagram showing a configuration of a main part of a typical conventional DSP system.
FIG. 9 is a block diagram showing a configuration of a main part of another conventional DSP system.
[Explanation of symbols]
10 C-BUS (data bus)
12 D-BUS (data bus)
14 G-BUS (data bus)
16 C-MEM (coefficient memory)
18 D-MEM (data memory)
20 G-MEM (General-purpose memory)
17, 19, 21 Addressing unit
22 EX-I / O (external memory input / output interface circuit)
24 AU-I / O (Audio Interface Circuit)
26 ALU (arithmetic logic unit)
28 MAC (product-sum operation unit)
30 Control device
32 P-MEM (program memory)
34 HOST-I / O (Host Interface Circuit)

Claims

In a digital signal processing device that processes a digital signal by executing a series of instructions in a pipeline system,
First, second and third buses adapted to transfer different data simultaneously;
A first internal memory connected to the first bus;
A second internal memory connected to the second bus;
Computing means connected to the first and second buses;
A third internal memory connected to the third bus and connected to at least one of the first and second buses;
Input / output interface means connected to the third bus, connected to at least one of the first and second buses, and capable of writing and reading data to and from an external memory;
A first instruction using the first and second buses and a second instruction using the third bus are executed in parallel during a predetermined one instruction execution cycle. And
In the predetermined one instruction execution cycle, for the first instruction, data is read from the first and second internal memories, respectively, and the read data is stored in the first and second internal memories. Then, a predetermined operation is performed on both data by the operation unit, and for the second instruction, predetermined address information is input to the input unit via the third bus. The data sent to the output interface means and then read from the external memory to the input / output interface means in advance is transferred to the third internal memory via the third bus. Digital signal processor.

The third internal memory has a first memory area for storing the address information for accessing the external memory and a second memory area for storing data transferred from the external memory. The set address information and the corresponding data each have a fixed offset between a memory address in the first memory area and a memory address in the second memory area in which each is stored. The digital signal processing device according to claim 1, wherein:

3. The digital signal processing device according to claim 1, wherein at least one of the first internal memory and the second internal memory is a dual-port type memory .