JPH04260950A

JPH04260950A - Cache memory device

Info

Publication number: JPH04260950A
Application number: JP3000467A
Authority: JP
Inventors: Fujio Itomitsu; 富士雄糸満; Yuichi Saito; 斎藤　祐一
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1991-01-08
Filing date: 1991-01-08
Publication date: 1992-09-16
Anticipated expiration: 2012-11-17
Also published as: JP2678527B2; US5509137A

Abstract

PURPOSE:To speed up processing when writing operation is continued at the time of hitting the writing operation. CONSTITUTION:The cache memory device is provided with a 1st and 2nd address registers 101, 102 for a tag memory and a data memory, a tag entry decoder 103, and a data entry decoder 105 and constituted so that the lower 6 bits of the register 101 are transferred to the register 102 through a transfer route 112 at the time of writing operation and tag comparison and the writing of preceding compared result data are processed in parallel by the same clock.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、マイクロプロセッサに
内蔵されるキャッシュメモリ装置に関し、特にデータキ
ャッシュメモリへのデータ書き込み動作の高速化を図る
技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cache memory device built into a microprocessor, and more particularly to a technique for increasing the speed of data writing to a data cache memory.

【０００２】0002

【従来の技術】マイクロプロセッサに内蔵されるキャッ
シュメモリは、メインメモリ上のデータ及び命令の中で
、よく使用されるものを記憶しておき、ＣＰＵが高速に
これらのデータや命令をアクセス可能にしたものである
。マイクロプロセッサ内蔵のキャッシュメモリには、デ
ータキャッシュメモリ（以下データキャッシュと略す）
と命令キャッシュメモリ（以下命令キャッシュと略す）
とがある。キャッシュメモリはメインメモリ上のデータ
や命令の存在する番地をタグ情報として記憶するタグメ
モリ部と、タグ情報に対応してデータや命令を記憶する
データメモリとで構成される。命令キャッシュの動作は
、タグの読み出し、タグ比較及びヒット・ミスの判定、
ヒットの場合はタグに対応する命令の読み出し（リード
）、ミスの場合はメインメモリからの命令の登録のし直
しからなる。データキャッシュの動作は、タグの読み出
し、タグ比較及びヒット・ミスの反転、ヒットの場合は
タグに対応するデータの読み出し（リード）またはデー
タの書き込み（ライト）、ミスの場合はメインメモリか
らのデータの登録のし直しからなる。[Background Art] A cache memory built into a microprocessor stores frequently used data and instructions in the main memory, so that the CPU can access these data and instructions at high speed. This is what I did. The cache memory built into the microprocessor includes data cache memory (hereinafter abbreviated as data cache).
and instruction cache memory (hereinafter abbreviated as instruction cache)
There is. The cache memory is composed of a tag memory section that stores addresses in the main memory where data and instructions exist as tag information, and a data memory that stores data and instructions in correspondence with the tag information. Instruction cache operations include reading tags, comparing tags, and determining hit/miss.
In the case of a hit, the instruction corresponding to the tag is read, and in the case of a miss, the instruction is re-registered from the main memory. Data cache operations include reading tags, comparing tags, and reversing hit/miss; in the case of a hit, the data corresponding to the tag is read (read) or written (write); in the case of a miss, data is written from the main memory. Consists of re-registration.

【０００３】従来のマイクロプロセッサに内蔵されたキ
ャッシュメモリ装置の構成を図８のブロック図に従って
説明する。図において１０１　はアドレスレジスタであ
り、該アドレスレジスタ１０１　はアドレスバス　（図
示せず）　を介して新たなアドレスが入力されると、そ
れを一時的に格納する。また、アドレスの下位部で選択
される行の数をエントリ数と呼ぶ。このアドレスの下位
部はデータメモリ１０６　と、タグメモリ１０４　とに
共通に設けられたエントリデコーダ１０３　に入力され
る。エントリデコーダ１０３　はタグメモリ１０４　及
びデータメモリ１０６　のエントリを選択するものであ
り、アドレスの下位部をデコードしてエントリを選択す
る。またアドレスの上位部はタグメモリ１０４　又はコ
ンパレータ１０７　に与えられる。コンパレータ１０７
　には選択されたエントリのタグメモリ１０４　内のタ
グ情報も与えられ、それが入力されたアドレスの上位部
と比較される。この比較結果はゲート１０８　に与えら
れ、ゲート１０８　は比較結果に応じてデータメモリ２
の選択されたエントリから出力された情報を有効データ
として演算装置（図示せず）に出力するか否か又は演算
装置からのデータを選択されたデータメモリ１０６　に
書き込むか否かを決定する。The configuration of a cache memory device built into a conventional microprocessor will be explained with reference to the block diagram shown in FIG. In the figure, 101 is an address register, and when a new address is input via an address bus (not shown), the address register 101 temporarily stores it. Further, the number of rows selected in the lower part of the address is called the number of entries. The lower part of this address is input to an entry decoder 103 provided in common to the data memory 106 and the tag memory 104. The entry decoder 103 selects an entry in the tag memory 104 and the data memory 106, and selects an entry by decoding the lower part of the address. Further, the upper part of the address is given to the tag memory 104 or the comparator 107. Comparator 107
is also provided with tag information in the tag memory 104 of the selected entry, which is compared with the upper part of the input address. This comparison result is given to the gate 108, and the gate 108 receives the data memory 2 according to the comparison result.
It is determined whether to output the information output from the selected entry to the arithmetic unit (not shown) as valid data or to write data from the arithmetic unit to the selected data memory 106.

【０００４】このように構成された従来のキャッシュメ
モリ装置は以下に示す如くの動作を行う。あるクロック
に同期してエントリデコーダ１０３　でアドレスの下位
部のデコードを行い、タグメモリ１０４　のエントリの
選択を行う。選択されたタグメモリ１０４　のエントリ
から前記クロック期間中にタグ情報が読み出される。読
み出されたタグ情報はコンパレータ１０７　で入力され
たアドレスの上位部と比較され、それらの値が一致した
場合はキャッシュはヒットしたといい、ヒット信号を生
成する。またそれらの値が不一致の場合は、キャッシュ
はミスしたという。特にライト動作の場合は、キャッシ
ュヒットをライトヒット、キャッシュミスをライトミス
という。The conventional cache memory device configured as described above operates as shown below. An entry decoder 103 decodes the lower part of the address in synchronization with a certain clock, and selects an entry in the tag memory 104. Tag information is read from the selected tag memory 104 entry during the clock period. The read tag information is compared with the upper part of the input address by the comparator 107, and if the values match, the cache is said to have hit, and a hit signal is generated. If the values do not match, the cache is said to have missed it. Especially in the case of a write operation, a cache hit is called a write hit, and a cache miss is called a write miss.

【０００５】図９は従来のデータキャッシュのライト動
作のタイミングチャートである。データキャッシュがラ
イト動作を行う場合に、タグ比較の結果、キャッシュが
ヒット（ライトヒット）したときは、データをタグ情報
に対応するエントリのデータメモリ１０６　に書き込む
。つまり、あるクロックに同期してタグの読み出し及び
比較を行い、ヒット信号を生成し、前記クロックの次の
クロックでゲート１０８を介してヒットしたタグ情報に
対応する行のデータメモリ１０６　へデータの書き込み
を行う。図８の例では、例えばあるクロック周期にタグ
メモリ１０４　の３行目（３エントリ目）からタグ情報
が読み出され、それとアドレスの上位部とをコンパレー
タ１０７　で比較した結果、ヒット信号がアクティブに
なる。すると次のクロック周期にデータメモリ１０４　
の３行目（３エントリ目）にデータを書き込み、タグメ
モリ１０４　からタグ情報を読み出す時はデータメモリ
１０６　ではデータの書き込みは行えない。またデータ
メモリ１０６　にデータを書き込む時はタグメモリ１０
４　からタグ情報の読み出しはできなかった。FIG. 9 is a timing chart of a write operation of a conventional data cache. When the data cache performs a write operation, if the tag comparison results in a cache hit (write hit), the data is written to the data memory 106 of the entry corresponding to the tag information. That is, tags are read and compared in synchronization with a certain clock, a hit signal is generated, and data is written to the data memory 106 in the row corresponding to the hit tag information via the gate 108 at the next clock. I do. In the example of FIG. 8, for example, tag information is read from the third row (third entry) of the tag memory 104 in a certain clock cycle, and as a result of comparing it with the upper part of the address by the comparator 107, the hit signal becomes active. Become. Then, in the next clock cycle, the data memory 104
When data is written to the third line (third entry) and tag information is read from the tag memory 104, data cannot be written to the data memory 106. Also, when writing data to the data memory 106, the tag memory 10
It was not possible to read tag information from 4.

【０００６】[0006]

【発明が解決しようとする課題】このように、従来の内
蔵キャッシュメモリのライト動作には、図９に示すよう
にタグ情報の読み出しと、タグ比較とに１クロック、そ
して前記タグ情報に対応するデータの書き換えに１クロ
ックと合計２クロックを要していた。従って、ライトヒ
ットが連続して生じると常に２クロック必要となり、デ
ータアクセスが遅くなるという問題があった。一般にキ
ャッシュメモリは外部記憶装置とマイクロプロセッサと
の間のデータの受け渡しを高速に行うために用いられて
いる。キャッシュメモリが外部にある場合は、それらの
装置間の遅延のため、キャッシュメモリからマイクロプ
ロセッサへのデータ転送にもマイクロプロセッサの内部
処理時間に比べ、多くのサイクルを必要としていた。従
ってライト動作が連続した場合でもキャッシュメモリの
遅延はあまり全体機能に影響を及ぼさないが、キャッシ
ュメモリがマイクロプロセッサに内蔵されている場合、
キャッシュメモリからマイクロプロセッサへのデータ転
送に要する遅延が少ないので、キャッシュメモリの内部
処理に要する遅延がマイクロプロセッサの演算速度を低
下させ、全体機能の低下を生じせしめていた。[Problems to be Solved by the Invention] As described above, in the write operation of the conventional built-in cache memory, as shown in FIG. It took one clock and two clocks in total to rewrite the data. Therefore, when write hits occur continuously, two clocks are always required, resulting in a problem that data access becomes slow. Generally, a cache memory is used to transfer data between an external storage device and a microprocessor at high speed. When the cache memory is external, data transfer from the cache memory to the microprocessor requires more cycles than the microprocessor's internal processing time due to delays between these devices. Therefore, even if write operations are continuous, the cache memory delay does not affect the overall functionality much, but if the cache memory is built into a microprocessor,
Since the delay required for data transfer from the cache memory to the microprocessor is small, the delay required for internal processing of the cache memory reduces the calculation speed of the microprocessor, resulting in a reduction in overall functionality.

【０００７】本発明は斯かる事情に鑑みなされたもので
あり、タグ用とデータ用とに各別にエントリデコーダを
設け、タグ比較と、データライトとをオーバラップさせ
ることにより、ライトヒットが連続した場合にライト動
作を連続的に行うことができ、キャッシュメモリへの書
き込み動作を高速化できるキャッシュメモリ装置を提供
することを目的にする。The present invention has been developed in view of the above circumstances, and by providing separate entry decoders for tags and data, and overlapping tag comparison and data writing, it is possible to prevent consecutive write hits. It is an object of the present invention to provide a cache memory device that can perform write operations continuously and speed up write operations to a cache memory.

【０００８】[0008]

【課題を解決するための手段】本発明に係るキャッシュ
メモリ装置は、タグメモリとデータメモリとに夫々専用
の第１のエントリデコーダ及び第１のアドレスレジスタ
と、第２のエントリデコーダ及び第２のアドレスレジス
タとを設けると共に、第１のアドレスレジスタに取り込
まれた主メモリの格納番地の一部を第２のアドレスレジ
スタに転送する経路を設けるようにしたものである。[Means for Solving the Problems] A cache memory device according to the present invention includes a first entry decoder and a first address register dedicated to a tag memory and a data memory, respectively, and a second entry decoder and a second address register dedicated to a tag memory and a data memory, respectively. In addition to providing an address register, a path is also provided for transferring part of the storage address of the main memory taken into the first address register to the second address register.

【０００９】[0009]

【作用】本発明においてはリード動作時は第１のアドレ
スレジスタと第２のアドレスレジスタとに同時に格納番
地が取り込まれ、第１及び第２のエントリデコーダで同
じエントリが選択されて、比較結果が一致すると選択さ
れたエントリのデータメモリの情報が読み出される。一
方ライト動作時には、第１のアドレスレジスタだけに格
納番地が取り込まれ、比較器による比較が行われ、比較
結果が一致すると、第１のアドレスレジスタに取り込ま
れた格納番地の一部が第２のアドレスレジスタに転送さ
れ、次の格納番地の取り込み時に第１のアドレスレジス
タは次の格納番地の一部を第１のエントリデコーダに出
力し、同時に第２のアドレスレジスタは先に転送された
格納番地の一部を第２のエントリデコーダに出力する。従ってタグメモリとデータメモリとで異なるエントリが
選択され、データの書き込みとタグ情報の比較とが同一
サイクルで行えることになり、ライト動作の高速化が図
れる。[Operation] In the present invention, during a read operation, the storage address is taken into the first address register and the second address register at the same time, the same entry is selected by the first and second entry decoders, and the comparison result is If there is a match, the information in the data memory of the selected entry is read out. On the other hand, during a write operation, the storage address is loaded only into the first address register, and a comparison is performed by a comparator. If the comparison results match, a part of the storage address loaded into the first address register is transferred to the second address register. When the next storage address is transferred to the address register, the first address register outputs a part of the next storage address to the first entry decoder, and at the same time, the second address register outputs a part of the next storage address to the first entry decoder. A part of the entry decoder is output to the second entry decoder. Therefore, different entries are selected in the tag memory and the data memory, and data writing and tag information comparison can be performed in the same cycle, thereby speeding up the write operation.

【００１０】0010

【実施例】以下本発明をその実施例を示す図面に基づい
て詳述する。図１は本発明のマイクロプロセッサの構成
を示すブロック図である。このマイクロプロセッサは大
きく分けて１０個のブロックからなり、それらは命令フ
ェッチ部（ＩＦＵ）　４７、命令デコード部（ＤＵ）　
４０　、第１マイクロＲＯＭ　部（ＩＲＯＭ）　４３　
、第２マイクロＲＯＭ　部（ＦＲＯＭ）　４４　、オペ
ランドアドレス計算部（ＡＵ）　４１　、ＰＣ計算部（
ＰＣＵ）４２、整数演算部（ＩＵ）　４５　、浮動小数
点演算部（ＦＰＵ）　４６、オペランドアクセス部（Ｏ
ＡＵ）　４８、バスインターフェース部（ＢＩＵ）　５
０である。バスインターフェース部５０は３２ビットの
命令バスと６４ビットのデータバスと３２ビットのアド
レスバスとを用いて外部メモリとのデータの入出力を制
御する。以下、各ブロックについてより詳細に説明して
いく。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below with reference to drawings showing embodiments thereof. FIG. 1 is a block diagram showing the configuration of a microprocessor according to the present invention. This microprocessor is roughly divided into 10 blocks, which are instruction fetch unit (IFU) 47, instruction decode unit (DU)
40, 1st micro ROM section (IROM) 43
, second micro ROM unit (FROM) 44 , operand address calculation unit (AU) 41 , PC calculation unit (
PCU) 42, integer operation unit (IU) 45, floating point operation unit (FPU) 46, operand access unit (O
AU) 48, Bus interface unit (BIU) 5
It is 0. The bus interface section 50 controls data input/output to/from external memory using a 32-bit command bus, a 64-bit data bus, and a 32-bit address bus. Each block will be explained in more detail below.

【００１１】（１）　命令フェッチ部命令フェッチ部４７には命令アドレスのアドレス変換機
構、８Ｋバイトの内蔵命令キャッシュ、論理アドレスを
物理アドレスに変換する６４エントリの命令用ＴＬＢ　
、３２バイトの２つの命令キューとそれらの制御部があ
る。命令フェッチ部４７は次にフェッチすべき命令の論
理アドレスを物理アドレスに変換し、内蔵命令キャッシ
ュから命令コードをフェッチし、命令デコード部４０へ
出力する。また命令フェッチ部４７は内蔵命令キャッシ
ュがミスした場合には、バスインターフェース部５０内
のアドレス入出力部へ物理アドレスを出力し、外部のメ
モリへのアクセスを要求し、バスインターフェース部５
０内の命令入力部を通して命令コードをフェッチして内
蔵命令キャッシュに登録する。(1) Instruction fetch unit The instruction fetch unit 47 includes an address conversion mechanism for instruction addresses, an 8K byte built-in instruction cache, and a 64-entry instruction TLB for converting logical addresses into physical addresses.
, two 32-byte instruction queues and their control units. The instruction fetch section 47 converts the logical address of the next instruction to be fetched into a physical address, fetches the instruction code from the built-in instruction cache, and outputs it to the instruction decoding section 40 . Furthermore, when the built-in instruction cache misses, the instruction fetch unit 47 outputs a physical address to the address input/output unit in the bus interface unit 50, requests access to external memory, and
The instruction code is fetched through the instruction input section in 0 and registered in the built-in instruction cache.

【００１２】命令キャッシュは命令バスから命令コード
をフェッチする場合と、データバスを３２ビット幅で動
作させて命令コードをフェッチする場合とは１６ｂｙｔ
ｅ×１２８ｅｎｔｒｙ×４ｗａｙ　の構成で動作し、デ
ータバスを６４ビット幅で動作させて命令コードをフェ
ッチする場合は３２ｂｙｔｅ×１２８ｅｎｔｒｙ×２ｗ
ａｙ　の構成で動作する。ＴＬＢ　は動作バスモードに
関わらず常に１６ｅｎｔｒｙ　×４ｗａｙ　の構成をと
る。２つの命令キューのうち１つは条件分岐命令に連続
する命令コードをプリフェッチしてキューイングし、も
う１つは分岐先の命令コードをプリフェッチしてキュー
イングする。フェッチすべき命令の論理アドレスは専用
のカウンタで計算される。ジャンプが起きたときには、
新たな命令の論理アドレスが、オペランドアドレス計算
部４１やＰＣ計算部４２や整数演算部４５からＪＡバス
により転送されてくる。命令用ＴＬＢ　がミスした場合
のページングによるアドレス変換及び命令用ＴＬＢ　更
新も命令フェッチ部４７の内部の制御回路により行う。また、このマイクロプロセッサがバススヌープ動作中は
アドレス入出力部を通してアドレスバス上のアドレスを
モニタし、必要ならば内蔵命令キャッシュの該当するエ
ントリを無効化する。[0012] The instruction cache is 16 bytes when fetching the instruction code from the instruction bus and when fetching the instruction code by operating the data bus with a width of 32 bits.
It operates with a configuration of e x 128 entries x 4 ways, and when operating the data bus with a 64-bit width to fetch instruction codes, it requires 32 bytes x 128 entries x 2 w.
It works with the ay configuration. The TLB always has a 16 entry x 4 way configuration regardless of the operating bus mode. One of the two instruction queues prefetches and queues the instruction code following the conditional branch instruction, and the other prefetches and queues the instruction code of the branch destination. The logical address of the instruction to be fetched is calculated by a dedicated counter. When a jump occurs,
The logical address of a new instruction is transferred from the operand address calculation unit 41, the PC calculation unit 42, and the integer calculation unit 45 via the JA bus. The internal control circuit of the instruction fetch unit 47 also performs address conversion by paging and update of the instruction TLB when the instruction TLB misses. Also, while this microprocessor is performing a bus snoop operation, it monitors the address on the address bus through the address input/output section, and invalidates the corresponding entry in the built-in instruction cache if necessary.

【００１３】（２）　命令デコード部命令デコード部４０では基本的に１６ビット　（ハーフ
ワード）　単位に命令コードをデコードする。このブロ
ックには第１ハーフワードに含まれるオペコードの前半
デコードを行う第１デコーダ、第１デコーダと同時に動
作してアドレッシングモードを指定する情報をデコード
するアドレッシングモードデコーダ及び第１デコーダの
デコード結果を入力してオペコードの後半デコードを行
いマイクロＲＯＭ　のエントリアドレスを出力する第２
デコーダが含まれる。命令デコード部４０には第１デコ
ーダ、アドレッシングモードデコーダ、第２デコーダか
らなる主デコーダでデコードされる命令に引き続く命令
を主デコーダと並列にデコードする副デコーダがある。副デコーダでデコード可能な命令は主デコーダでデコー
ドされる命令とデータ依存関係がないレジスタ間演算命
令であり、副デコーダでデコードされた命令は、主デコ
ーダでデコードされた命令がマイクロプログラム制御で
実行されるときに同時にハードワイヤード制御で実行さ
れる。このマイクロプロセッサは副デコーダでデコード
可能な命令をハードウェアが自動的に選択するスーパー
スケーラアーキテクチャを採用している。(2) Instruction decoding section The instruction decoding section 40 basically decodes instruction codes in units of 16 bits (halfwords). This block receives the first decoder that decodes the first half of the opcode included in the first halfword, the addressing mode decoder that operates simultaneously with the first decoder to decode information specifying the addressing mode, and the decoding results of the first decoder. The second part decodes the second half of the operation code and outputs the entry address of the micro ROM.
Contains a decoder. The instruction decoding unit 40 includes a sub-decoder that decodes an instruction following the instruction decoded by the main decoder consisting of a first decoder, an addressing mode decoder, and a second decoder in parallel with the main decoder. Instructions that can be decoded by the sub-decoder are inter-register operation instructions that have no data dependency with the instructions decoded by the main decoder, and instructions decoded by the sub-decoder are executed under microprogram control. executed under hardwired control at the same time. This microprocessor employs a superscalar architecture in which hardware automatically selects instructions that can be decoded by the sub-decoder.

【００１４】命令デコード部４０には条件分岐命令の分
岐予測を行う分岐予測機構、オペランドアドレス計算の
ときのデータハザードをチェックしてパイプラインをイ
ンターロックするためのスコアボーディングレジスタも
含まれる。命令デコード部４０では命令フェッチ部４７
より出力された命令コードを１クロックにつき０〜８バ
イトデコードする。デコード結果のうち、整数演算部４
５での演算に関する情報がＩＲＯＭ部４３に、浮動小数
点演算部４６での演算に関する情報がＦＲＯＭ部４４に
、オペランドアドレス計算に関係する情報がオペランド
アドレス計算部４１に、ＰＣ計算に関係する情報がＰＣ
計算部４２に、夫々出力される。The instruction decoding unit 40 also includes a branch prediction mechanism for predicting branches of conditional branch instructions, and a scoreboarding register for checking data hazards during operand address calculation and interlocking the pipeline. In the instruction decoding section 40, an instruction fetch section 47
0 to 8 bytes are decoded per clock. Of the decoding results, integer operation section 4
Information regarding the calculation in 5 is stored in the IROM section 43, information regarding the calculation in the floating point calculation section 46 is stored in the FROM section 44, information related to the operand address calculation is stored in the operand address calculation section 41, and information related to the PC calculation is stored in the operand address calculation section 41. PC
Each is output to the calculation unit 42.

【００１５】（３）　　　ＩＲＯＭ部ＩＲＯＭ部４３には整数演算部４５の制御を行う種々の
マイクロプログラムルーチンが格納されているマイクロ
ＲＯＭ　、マイクロシーケンサ、マイクロ命令デコーダ
などが含まれる。マイクロ命令はマイクロＲＯＭ　から
１クロックに１度読み出され、１つのマイクロ命令で１
つのレジスタ間演算が行われるため、転送・比較・加算
減算・論理演算などの命令は１クロックで終了する。マ
イクロシーケンサは命令実行に関するマイクロプログラ
ム実行のためのシーケンス処理の他に、例外、割込、ト
ラップ（この３つをあわせてＥＩＴと呼ぶ）　の受付け
と各ＥＩＴ　に対応するマイクロプログラムのシーケン
ス処理も行う。ＩＲＯＭ部４３には命令コードに依存し
ない外部割込みや整数演算実行結果によるマイクロプロ
グラムの分岐条件も入力される。(3) IROM section The IROM section 43 includes a micro ROM storing various micro program routines for controlling the integer calculation section 45, a micro sequencer, a micro instruction decoder, and the like. Microinstructions are read from the microROM once per clock, and one microinstruction reads one
Since operations are performed between two registers, instructions such as transfer, comparison, addition/subtraction, and logical operations are completed in one clock. In addition to sequence processing for executing microprograms related to instruction execution, the microsequencer also accepts exceptions, interrupts, and traps (these three are collectively called EIT), and performs sequence processing for microprograms corresponding to each EIT. . The IROM unit 43 also receives external interrupts that do not depend on instruction codes and microprogram branch conditions based on the results of integer arithmetic operations.

【００１６】（４）　ＦＲＯＭ部ＦＲＯＭ部４４には浮動小数点演算部４６の制御を行う
種々のマイクロプログラムルーチンが格納されているマ
イクロＲＯＭ　、マイクロシーケンサ、マイクロ命令デ
コーダなどが含まれる。マイクロ命令はマイクロＲＯＭ
　から１クロックに１度読み出され、１つの浮動小数点
演算が最小２マイクロ命令で完了する。マイクロシーケ
ンサはマイクロプログラムで示されるシーケンス処理の
他に、浮動小数点演算にかかわる例外の処理も行い、マ
スクされていない浮動小数点例外が検出された場合には
ＩＲＯＭ部４３へ例外処理を要求する。マイクロ命令デ
コーダで浮動小数点演算命令がデコードされたとき、そ
のデコード結果はＩＲＯＭ部４３とＦＲＯＭ部４４とへ
同時に出力され、最初の１マイクロステップは整数演算
部４５と浮動小数点演算部４６とがともに動作する。し
かし、ＦＲＯＭ部４４のマイクロシーケンサとＩＲＯＭ
部４３のマイクロシーケンサとは独立の制御されるため
、第２ステップ以降は整数演算部４５と浮動小数点演算
部４６とが独立に動作する。(4) FROM section The FROM section 44 includes a micro ROM storing various micro program routines for controlling the floating point arithmetic section 46, a micro sequencer, a micro instruction decoder, and the like. Micro instructions are micro ROM
is read out once per clock, and one floating-point operation can be completed with a minimum of two microinstructions. In addition to sequence processing indicated by the microprogram, the microsequencer also handles exceptions related to floating point operations, and requests exception handling to the IROM unit 43 when an unmasked floating point exception is detected. When a floating point arithmetic instruction is decoded by the microinstruction decoder, the decoding result is simultaneously output to the IROM section 43 and the FROM section 44, and for the first microstep, both the integer arithmetic section 45 and the floating point arithmetic section 46 Operate. However, the micro sequencer of the FROM section 44 and the IROM
Since they are controlled independently of the microsequencer of the section 43, the integer operation section 45 and the floating point operation section 46 operate independently from the second step onwards.

【００１７】（５）　　　オペランドアドレス計算部オ
ペランドアドレス計算部４１は、命令デコード部４０の
アドレッシングモードデコーダから出力されたオペラン
ドアドレス計算に関係する制御情報によりハードワイヤ
ード制御される。このブロックではメモリ間接アドレッ
シングのためのメモリアクセス以外のオペランドのアド
レス計算と、ジャンプ命令のジャンプ先アドレスの計算
とが行われる。メモリ間接アドレッシングのときフェッ
チした間接アドレスはＡＧバスでオペランドアクセス部
４８からオペランドアドレス計算部４１へ転送される。オペランドアドレスの計算結果は整数演算部４５へ出力
される。オペランドアドレス計算終了段階での先行ジャ
ンプ処理ではジャンプ先アドレスの計算結果がＪＡバス
を通して命令フェッチ部４７とＰＣ計算部４２とへ出力
される。即値オペランドは整数演算部４５と浮動小数点
演算部４６とへ出力される。６４ビット即値の上位３２
ビットはＡＧバスで転送される。アドレス計算に必要な
汎用レジスタやプログラムカウンタの値は整数演算部４
５やＰＣ計算部４２からＩＸバスで転送される。(5) Operand Address Calculation Unit The operand address calculation unit 41 is hard-wired controlled by control information relating to operand address calculation output from the addressing mode decoder of the instruction decoding unit 40. In this block, address calculations for operands other than memory access for memory indirect addressing and jump destination addresses for jump instructions are performed. The indirect address fetched during memory indirect addressing is transferred from the operand access section 48 to the operand address calculation section 41 via the AG bus. The calculation result of the operand address is output to the integer calculation section 45. In advance jump processing at the end of operand address calculation, the calculation result of the jump destination address is output to the instruction fetch section 47 and the PC calculation section 42 through the JA bus. The immediate operand is output to an integer arithmetic unit 45 and a floating point arithmetic unit 46. Upper 32 of 64-bit immediate value
Bits are transferred on the AG bus. The values of general-purpose registers and program counters necessary for address calculation are stored in the integer calculation section 4.
5 and the PC calculation unit 42 via the IX bus.

【００１８】（６）　　　ＰＣ計算部ＰＣ計算部４２は命令デコード部４０から出力されるＰ
Ｃ計算に関係する情報でハードワイヤードに制御され、
命令のＰＣ値を計算する。マイクロプロセッサの命令は
可変長命令であり、命令をデコードしてみないとその命
令の長さが判らない。ＰＣ計算部４２は、命令デコード
部４０から出力される命令長をデコード中の命令のＰＣ
値に加算することによりつぎの命令のＰＣ値を計算する
。ＰＣ計算部４２の計算結果は各命令のＰＣ値として命
令のデコード結果とともに出力され、命令と同時にパイ
プラインを流れる。命令デコードステージでの先行ブラ
ンチ処理では、先行ブランチ先命令のアドレスを計算す
る。絶対アドレスへのジャンプ命令も命令デコードステ
ージで処理される。また、ＰＣ計算部４２にはサブルー
チンへのジャンプ命令を実行時にスタックにプッシュし
たサブルーチンからの戻り先ＰＣ値のコピーを保持した
ＰＣスタックがあり、サブルーチンからのリターン命令
に対してはＰＣスタックから戻り先ＰＣを読み出すこと
により、リターン先命令のアドレスを生成する行先リタ
ーン処理を行う。先行ブランチが先行リターンによるジ
ャンプ先アドレスはＪＡバスを通して命令フェッチへ転
送される。(6) PC calculation unit The PC calculation unit 42 calculates the P output from the instruction decoding unit 40.
Hardwired control with information related to C calculation,
Calculate the PC value of the instruction. Microprocessor instructions are variable length instructions, and the length of the instruction cannot be determined until the instruction is decoded. The PC calculating unit 42 calculates the instruction length output from the instruction decoding unit 40 by calculating the PC of the instruction being decoded.
Calculate the PC value of the next instruction by adding to the value. The calculation results of the PC calculation unit 42 are output as the PC values of each instruction together with the decoding results of the instructions, and flow through the pipeline at the same time as the instructions. In the preceding branch processing at the instruction decode stage, the address of the preceding branch destination instruction is calculated. Jump instructions to absolute addresses are also processed at the instruction decode stage. In addition, the PC calculation unit 42 has a PC stack that holds a copy of the return destination PC value from the subroutine that is pushed onto the stack when a jump instruction to the subroutine is executed. By reading the destination PC, destination return processing is performed to generate the address of the return destination instruction. The jump destination address due to the preceding branch and preceding return is transferred to the instruction fetch via the JA bus.

【００１９】（７）　　　整数演算部整数演算部４５はＩＲＯＭ部のマイクロＲＯＭ　に格納
されたマイクロプログラムにより制御され、各整数演算
命令の機能を実現するに必要な演算を整数演算部４５の
内部にあるレジスタファイルと演算器とで実行する。レ
ジスタファイルには汎用レジスタや作業用レジスタが含
まれる。演算器は主ＡＬＵ　、主バレルシフタ、プライ
オリティエンコーダなどを含む主演算器と、副ＡＬＵ　
及び副シフタを含む副演算器とに分かれており、主演算
器と副演算器とは夫々レジスタファイルと３本のバスで
結合されている。副演算器は命令デコード部４０の副デ
コーダでデコードされた命令の演算を行うほかマイクロ
プログラムにより制御することも可能である。この機能
により整数演算部４５では高機能命令を実行するとき２
つの演算や２つのレジスタ間データ転送を同時に行って
、命令を高速実行する。(7) Integer operation section The integer operation section 45 is controlled by a microprogram stored in the micro ROM of the IROM section, and the operations necessary to realize the functions of each integer operation instruction are stored inside the integer operation section 45. It is executed using a certain register file and arithmetic unit. The register file contains general-purpose registers and work registers. The computing units include the main ALU, main barrel shifter, priority encoder, etc., and the sub ALU.
and a sub-processing unit including a sub-shifter, and the main processing unit and sub-processing unit are each connected to a register file by three buses. The sub-operation unit not only operates on instructions decoded by the sub-decoder of the instruction decoding unit 40, but also can be controlled by a microprogram. With this function, the integer arithmetic unit 45 uses 2
It executes instructions at high speed by simultaneously performing two operations and transferring data between two registers.

【００２０】命令の演算対象となるオペランドがアドレ
スや即値の場合は、オペランドアドレス計算部４１から
即値や計算されたアドレスが入力される。また、命令の
演算対象となるオペランドがメモリ上のデータである場
合は、オペランドアドレス計算部４１で計算されたアド
レスがＡＡバスを通してオペランドアクセス部４８へ出
力され、内蔵データキャッシュや外部からフェッチした
オペランドがＤＤバスを通して整数演算部４５へ入力さ
れる。演算のとき内蔵データキャッシュ、外部のメモリ
をリードする必要があるときはマイクロプログラムの指
示によりＡＡバスを通してオペランドアクセス部４８へ
アドレスを出力し、ＤＤバスから目的のデータをフェッ
チする。演算結果を本発明の内蔵データキャッシュや外
部のメモリへストアする必要があるときはマイクロプロ
グラムの指示によりオペランドアクセス部４８へＡＡバ
スとＤＤバスとでアドレスとデータとを出力する。この
ときＰＣ計算部４２ではそのストア動作を行った命令の
ＰＣ値がストアバッファに対応するラッチに保持される
。外部割込みや例外の処理などを行って新たな命令アド
レスを整数演算部４５が得たときはこれをＪＡバスを通
して命令フェッチ部４７とＰＣ計算部４２に出力する。When the operand to be operated on by the instruction is an address or an immediate value, the immediate value or the calculated address is input from the operand address calculation section 41. Furthermore, when the operand to be operated on by the instruction is data on memory, the address calculated by the operand address calculation unit 41 is output to the operand access unit 48 through the AA bus, and the operand fetched from the built-in data cache or external is input to the integer calculation unit 45 through the DD bus. When it is necessary to read the built-in data cache or external memory during an operation, the address is output to the operand access unit 48 through the AA bus according to instructions from the microprogram, and the target data is fetched from the DD bus. When it is necessary to store the calculation result in the built-in data cache of the present invention or an external memory, the address and data are outputted to the operand access unit 48 via the AA bus and the DD bus according to instructions from the microprogram. At this time, in the PC calculation unit 42, the PC value of the instruction that performed the store operation is held in the latch corresponding to the store buffer. When the integer operation section 45 obtains a new instruction address by processing an external interrupt or exception, it outputs this to the instruction fetch section 47 and the PC calculation section 42 through the JA bus.

【００２１】（８）　　　浮動小数点演算部浮動小数点
演算部４６はＦＲＯＭ部４４のマイクロＲＯＭ　に格納
されたマイクロプログラムにより制御され、各浮動小数
点演算命令の機能を実現するに必要な演算を浮動小数点
演算部の内部にあるレジスタファイルと演算器とで実行
する。浮動小数点演算部４６には乗算器があり、浮動小
数点乗算を高速に実行するほか整数乗算命令のための乗
算も行う。浮動小数点演算部４６には浮動小数点演算の
丸め処理方法と浮動小数点演算例外の検出許可をモード
設定する浮動小数点演算モード制御レジスタ（ＦＭＣ）
　と浮動小数点演算結果に対するフラグや浮動小数点例
外の発生状態を示すステータスビットからなる浮動小数
点演算状態語（ＦＳＷ）　とがある。命令の演算対象と
なるオペランドが即値の場合は、オペランドアドレス計
算部４１から出力された即値がＳ１バス　（下位３２ビ
ット）　とＡＧバス　（上位３２ビット）　とで転送さ
れる。また、命令の演算対象となるオペランドがメモリ
上のデータである場合は、オペランドアドレス計算部４
１で計算されたアドレスがＡＡバスを通してオペランド
アクセス部４８へ出力され、内蔵データキャッシュや外
部のメモリからフェッチしたオペランドがＤＤバスから
一度整数演算部４５に転送され、Ｓ１バスとＳ２バスを
介して浮動小数点演算部４６へ入力される。(8) Floating point arithmetic unit The floating point arithmetic unit 46 is controlled by a microprogram stored in the micro ROM of the FROM unit 44, and performs floating point arithmetic operations necessary to realize the functions of each floating point arithmetic instruction. It is executed using the register file and arithmetic unit inside the unit. The floating point arithmetic unit 46 includes a multiplier, which executes floating point multiplication at high speed and also performs multiplication for integer multiplication instructions. The floating point arithmetic unit 46 includes a floating point arithmetic mode control register (FMC) that sets the rounding method for floating point arithmetic operations and the permission to detect floating point arithmetic exceptions.
and a floating point arithmetic status word (FSW) consisting of flags for floating point arithmetic results and status bits indicating the occurrence status of floating point exceptions. When the operand to be operated on by the instruction is an immediate value, the immediate value output from the operand address calculation unit 41 is transferred via the S1 bus (lower 32 bits) and the AG bus (upper 32 bits). In addition, if the operand to be operated on by the instruction is data on memory, the operand address calculation unit 4
The address calculated in step 1 is output to the operand access unit 48 via the AA bus, and the operand fetched from the built-in data cache or external memory is once transferred from the DD bus to the integer operation unit 45, and then transferred via the S1 bus and the S2 bus. It is input to the floating point arithmetic unit 46.

【００２２】オペランドを内蔵データキャッシュ、外部
のメモリへストアする必要があるときはマイクロプログ
ラムの指示によりＤ１バス、Ｄ３バスを介し一度整数演
算部４５にデータを転送し、整数演算部４５からＤＤバ
スを通してオペランドアクセス部４８へデータを出力す
る。ストア動作では浮動小数点演算部４６と整数演算部
４５とが協調して動作し、ＡＡバスを通してオペランド
アクセス部４８に対して整数演算部４５からアドレスが
出力され、浮動小数点演算部４６からデータが出力され
る。When it is necessary to store the operand in the built-in data cache or external memory, the data is transferred once to the integer calculation section 45 via the D1 bus and D3 bus according to instructions from the microprogram, and then transferred from the integer calculation section 45 to the DD bus. The data is output to the operand access unit 48 through the In the store operation, the floating point arithmetic unit 46 and the integer arithmetic unit 45 work together, and the integer arithmetic unit 45 outputs an address to the operand access unit 48 through the AA bus, and the floating point arithmetic unit 46 outputs data. be done.

【００２３】（９）　　　オペランドアクセス部オペラ
ンドアクセス部４８にはオペランドアドレスのアドレス
変換機構、８ＫＢの本発明の内蔵データキャッシュ、６
４エントリのデータ用ＴＬＢ　、２エントリのオペラン
ドプリフェッチキュー、３エントリのストアバッファと
それら制御部がある。内蔵データキャッシュの構成は３
２ｂｙｔｅ×６４ｅｎｔｒｙ　×４ｗａｙ　で、ＴＬＢ
　の構成は１６ｅｎｔｒｙ　×４ｗａｙ　である。デー
タのロード動作ではオペランドアドレス計算部４１や整
数演算部４５から出力されたロードすべきデータの論理
アドレスを物理アドレスに変換し、内蔵データキャッシ
ュからデータをフェッチし、整数演算部４５や浮動小数
点演算部４６へ出力する。内蔵データキャッシュがミス
した場合にはバスインターフェース部５０内のアドレス
入出力部へ物理アドレスを出力し、外部へのデータアク
セスを要求し、データ入出力部を通して入力されたデー
タを内蔵データキャッシュに登録する。(9) Operand Access Unit The operand access unit 48 includes an address translation mechanism for operand addresses, an 8 KB built-in data cache of the present invention, 6
There is a 4-entry data TLB, a 2-entry operand prefetch queue, a 3-entry store buffer, and a control unit for these. The built-in data cache has 3 configurations.
2 bytes x 64 entries x 4 ways, TLB
The configuration is 16 entries x 4 ways. In the data loading operation, the logical address of the data to be loaded outputted from the operand address calculation unit 41 and the integer calculation unit 45 is converted into a physical address, the data is fetched from the built-in data cache, and the integer calculation unit 45 and floating point calculation are performed. It outputs to section 46. If the built-in data cache misses, it outputs the physical address to the address input/output section in the bus interface section 50, requests data access to the outside, and registers the data input through the data input/output section in the built-in data cache. do.

【００２４】データのストア動作では整数演算部４５か
ら出力されたストアすべきデータの論理アドレスを物理
アドレスに変換し、整数演算部４５や浮動小数点演算部
４６から出力されたデータを内蔵データキャッシュにス
トアするとともに、アドレス入出力部へ物理アドレスを
出力し、ストアバッファを介してデータ入出力部からデ
ータを外部へ出力する。ストアバッファではストアすべ
きデータとそのアドレス、さらにそのストア動作を行っ
た命令のアドレスを１組にして管理する。ストアバッフ
ァでのストア動作は先入れ先だし制御方式で管理される
。データ用ＴＬＢ　がミスした場合のページングによる
アドレス変換やデータ用ＴＬＢ更新もオペランドアクセ
ス部４８の内部の制御回路により行う。また、メモリア
クセスアドレスがメモリにマップされたＩ／Ｏ　領域に
入るかどうかのチェックも行われる。また、マイクロプ
ロセッサがバススヌープ動作中はバスインターフェース
部５０を通して入力された物理アドレスがヒットする内
蔵データキャッシュのエントリを無効化する。In the data store operation, the logical address of the data to be stored outputted from the integer calculation unit 45 is converted into a physical address, and the data outputted from the integer calculation unit 45 and the floating point calculation unit 46 is stored in the built-in data cache. At the same time as storing, the physical address is output to the address input/output section, and the data is outputted from the data input/output section to the outside via the store buffer. The store buffer manages data to be stored, its address, and the address of the instruction that performed the store operation as a set. Store operations in the store buffer are managed on a first-in, first-out basis. The internal control circuit of the operand access unit 48 also performs address conversion by paging and updating of the data TLB when the data TLB misses. A check is also made to see if the memory access address falls within the memory mapped I/O area. Furthermore, while the microprocessor is performing a bus snoop operation, it invalidates the entry in the built-in data cache that is hit by the physical address input through the bus interface section 50.

【００２５】（１０）　　バスインターフェース部バス
インターフェース部５０はアドレス入出力部と命令入力
部とデータ入出力部とからなる。アドレス入出力部は命
令フェッチ部４７とオペランドアクセス部４８とから出
力されたアドレスをマイクロプロセッサの外部に出力す
る。アドレスの出力はマイクロプロセッサで定められた
バスプロトコルに従って行われる。バスプロトコルの制
御はアドレス入出力部内にある外部バス制御回路で行う
。外部バス制御回路ではバスアクセス例外、外部割込み
の受付も行う。また、このマイクロプロセッサ以外の外
部デバイスがバスマスタになっており、マイクロプロセ
ッサがバススヌープ動作中は外部デバイスがデータライ
トを実行した場合にアドレスバス上に出力されたアドレ
スを取り込み、命令フェッチ部４７とオペランドアクセ
ス部４８に転送する。アドレスの取り込み動作は、デー
タライトサイクルにバーＤＳ信号がアサートされたとき
　（クロック非同期、エッジセンス）　とバーＭＲＥＱ
信号がアサート中にバーＭＳ信号がアサートされたとき
　（クロック同期レベルセンス）　に行われる。(10) Bus Interface Section The bus interface section 50 consists of an address input/output section, a command input section, and a data input/output section. The address input/output section outputs the addresses output from the instruction fetch section 47 and the operand access section 48 to the outside of the microprocessor. Address output is performed according to the bus protocol defined by the microprocessor. Control of the bus protocol is performed by an external bus control circuit located within the address input/output section. The external bus control circuit also accepts bus access exceptions and external interrupts. In addition, an external device other than this microprocessor is the bus master, and when the microprocessor is performing bus snooping, it captures the address output on the address bus when the external device executes a data write, and takes in the address output to the address bus when the external device executes data write. The data is transferred to the operand access unit 48. Address capture operation occurs when the DS signal is asserted during the data write cycle (clock asynchronous, edge sense) and when the MREQ
This is done when the MS signal is asserted while the signal is asserted (clock synchronous level sensing).

【００２６】命令入力部は命令バス　（又はデータバス
）　から３２ビット　（又は６４ビット）　ごとに命令
コードをマイクロプロセッサへ入力する。命令キャッシ
ュのアクセス方法には１つのアドレスに対して１回だけ
３２ビット　（又は６４ビット）　の命令コードをフェ
ッチする標準バスサイクルと１つのアドレスに対して４
回連続で３２ビット　（又は６４ビット）　の命令コー
ドをフェッチするブロック転送バスサイクルとがある。命令入力部はフェッチした命令コードを命令フェッチ部
４７へ転送する。データ入出力部はオペランドのロード
動作のときデータバスからデータを取り込み、オペラン
ドアクセス部４８へ転送する。オペランドのストア動作
のときオペランドアクセス部４８から出力されたオペラ
ンドをデータバスへ出力する。データキャッシュなど外
部のメモリのアクセス方法には１つのアドレスに対して
６４ビットのデータをアクセスする標準バスサイクルと
１つのアドレスに対して４回連続で６４ビット又は３２
ビットのデータをアクセスするブロック転送バスサイク
ルとがあり、どちらの場合もデータ入出力部はオペラン
ドアクセス部４８と外部メモリとでやり取りするデータ
の入出力を行う。The instruction input section inputs instruction codes to the microprocessor every 32 bits (or 64 bits) from an instruction bus (or data bus). Instruction cache access methods include a standard bus cycle in which a 32-bit (or 64-bit) instruction code is fetched only once per address, and a standard bus cycle in which a 32-bit (or 64-bit) instruction code is fetched only once per address;
There is a block transfer bus cycle in which a 32-bit (or 64-bit) instruction code is fetched consecutively. The instruction input unit transfers the fetched instruction code to the instruction fetch unit 47. The data input/output unit takes in data from the data bus during an operand load operation and transfers it to the operand access unit 48. During the operand store operation, the operand output from the operand access unit 48 is output to the data bus. Access methods for external memory such as data cache include standard bus cycles that access 64-bit data for one address, and four consecutive 64-bit or 32-bit accesses for one address.
There is a block transfer bus cycle for accessing bit data, and in both cases, the data input/output section inputs/outputs data exchanged between the operand access section 48 and the external memory.

【００２７】次にマイクロプロセッサのパイプライン処
理機構について説明する。このマイクロプロセッサは各
種のバッファ記憶と、命令バスやデータバスを使用した
メモリとの効率的アクセスにより、命令をパイプライン
処理して高性能に動作する。以下、その内容を詳細に説
明する。・　　パイプライン機構マイクロプロセッサのパイプライン図を図２に示す。命
令のプリフェッチを行う命令フェッチステージ　（ＩＦ
ステージ）　２１、命令のデコードを行うデコードステ
ージ　（Ｄステージ）２２、オペランドのアドレス計算
を行うオペランドアドレス計算ステージ（Ａステージ）
２３、マイクロＲＯＭ　アクセス　（特にＲステージと
呼ぶ）２４とオペランドのプリフェッチ（特にＯＦステ
ージと呼ぶ）　２５を行うオペランドフェッチステージ
（Ｆステージ）２６、命令の実行を行う実行ステージ（
Ｅステージ）２７、メモリオペランドのストアを行うス
トアステージ（Ｓステージ）２８の６段構成でパイプラ
イン処理を行う。Ｓステージ２８には３段のストアバッ
ファがある。Next, the pipeline processing mechanism of the microprocessor will be explained. This microprocessor operates at high performance by pipeline processing of instructions through various types of buffer storage and efficient access to memory using an instruction bus and a data bus. The contents will be explained in detail below. - Pipeline mechanism Figure 2 shows the pipeline diagram of the microprocessor. Instruction fetch stage (IF
21. Decode stage (D stage) to decode instructions 22. Operand address calculation stage (A stage) to calculate operand addresses
23. Micro ROM access (especially called R stage) 24, operand prefetch (especially called OF stage) 25, operand fetch stage (F stage) 26, execution stage (which executes instructions)
Pipeline processing is performed in a six-stage configuration including a store stage (E stage) 27 and a store stage (S stage) 28 that stores memory operands. The S stage 28 has three stages of store buffers.

【００２８】各ステージは他のステージとは独立に動作
し、理論上は６つのステージが完全に独立動作する。Ｓ
ステージ２８以外の各ステージは１回の処理を最小１ク
ロックで行うことができる。Ｓステージ２８は１回のオ
ペランドストア処理を最小２クロックで行うことができ
る。従ってメモリオペランドのストア処理がない場合、理想
的には１クロックごとに次々とパイプライン処理が進行
する。このマイクロプロセッサにはメモリ−メモリ間演
算や、メモリ間接アドレッシングなど、基本パイプライ
ン処理１回だけでは処理が行えない命令があるが、マイ
クロプロセッサはこれらの処理に対してもなるべく均衡
したパイプライン処理が行えるように設計されている。複数のメモリオペランドをもつ命令に対してはメモリオ
ペランドの数をもとにデコード段階で複数のパイプライ
ン処理単位（ステップコード）に分解してパイプライン
処理を行うのである。Each stage operates independently of the other stages, and in theory the six stages operate completely independently. S
Each stage other than stage 28 can perform one process in at least one clock. The S stage 28 can perform one operand store process in a minimum of two clocks. Therefore, if there is no memory operand store processing, ideally pipeline processing proceeds one after another every clock. This microprocessor has instructions such as memory-to-memory operations and memory indirect addressing that cannot be processed by just one basic pipeline process, but the microprocessor uses a pipeline process that is as balanced as possible for these processes as well. is designed to be able to do so. For instructions with multiple memory operands, the instructions are decomposed into multiple pipeline processing units (step codes) at the decoding stage based on the number of memory operands, and pipeline processing is performed.

【００２９】ＩＦステージ２１からＤステージ２２に渡
される情報は命令コードそのものである。Ｄステージ２
２からＡステージ２３に渡される情報は命令で指定され
た演算に関するもの（以下コードという）と、オペラン
ドのアドレス計算に関係するもの（以下Ａコードという
）と処理中命令のプログラムカウンタ値の３つである。Ａステージ２３からＦステージ２６に渡される情報はマ
イクロプログラムルーチンのエントリ番地やマイクロプ
ログラムへのパラメータなどを含むＲコードと、オペラ
ンドのアドレスとアクセス方法指示情報などを含むＦコ
ードと、処理中命令のプログラムカウンタ値とスタック
ポインタ値との４つである。Ｆステージ２６からＥステ
ージ２７にわたされる情報は演算制御情報とリテラルな
どを含むＥコードと、オペランドやオペランドアドレス
などを含むＳコードと処理中命令のプログラムカウンタ
値とスタックポインタ値との４つである。Ｓコードはア
ドレスとデータとからなる。Ｅステージ２７からＳステ
ージ２８に渡される情報はストアすべき演算結果である
Ｗコードとその演算結果を出力した命令のプログラムカ
ウンタ値との２つである。Ｗコードはアドレスとデータ
とからなる。The information passed from the IF stage 21 to the D stage 22 is the instruction code itself. D stage 2
There are three types of information passed from 2 to the A stage 23: information related to the operation specified by the instruction (hereinafter referred to as code), information related to operand address calculation (hereinafter referred to as A code), and the program counter value of the instruction being processed. It is. The information passed from the A stage 23 to the F stage 26 includes an R code that includes the entry address of the microprogram routine and parameters to the microprogram, an F code that includes operand addresses and access method instruction information, and information about the instruction being processed. There are four values: a program counter value and a stack pointer value. The information passed from the F stage 26 to the E stage 27 consists of four types: an E code that includes arithmetic control information and literals, an S code that includes operands and operand addresses, and the program counter value and stack pointer value of the instruction being processed. be. The S code consists of an address and data. The information passed from the E stage 27 to the S stage 28 is the W code, which is the operation result to be stored, and the program counter value of the instruction that outputs the operation result. The W code consists of an address and data.

【００３０】Ｅステージ２７以前のステージで検出され
たＥＩＴ　はそのコードがＥステージ２７に到達するま
でＥＩＴ　処理を起動しない。Ｅステージ２７で処理さ
れている命令のみが実行段階の命令であり、ＩＦステー
ジ２１〜Ｆステージ２６で処理されている命令はまだ実
行段階に至っていないのである。従ってＥステージ２７
以前で検出されたＥＩＴは検出したことをステップコー
ド中に記録して次のステージに伝えられるのみである。Ｓステージ２８で検出されたＥＩＴ　はＥステージ２７
で処理中命令が完了した時点またはその命令の処理をキ
ャンセルして受け付けられ、Ｅステージ２７に戻って処
理される。An EIT detected at a stage before E stage 27 does not start EIT processing until its code reaches E stage 27. Only the instructions being processed in the E stage 27 are in the execution stage, and the instructions being processed in the IF stages 21 to F stages 26 have not yet reached the execution stage. Therefore, E stage 27
For EITs detected previously, the detection is simply recorded in the step code and transmitted to the next stage. EIT detected at S stage 28 is EIT detected at E stage 27
When the instruction being processed is completed or the processing of the instruction is canceled, the instruction is accepted, and the process returns to the E stage 27 to be processed.

【００３１】・　　各パイプラインステージの処理各パ
イプラインステージの入出力ステップコードには図２に
示したように便宜上名前が付けられている。またステッ
プコードはオペコードに関する処理を行い、マイクロＲ
ＯＭ　のエントリ番地やＥステージ２７に対するパラメ
ータなどになる系列とＥステージ２７の処理対象のオペ
ランドになる系列との２系列がある。また、Ｄステージ
２２からＳステージ２８の間では処理中命令のプログラ
ムカウンタ値が受け渡され、Ａステージ２３からＥステ
ージ２７の間ではスタックポインタ値が　（さらにはス
コアボードレジスタ値も）　受け渡される。- Processing of each pipeline stage The input/output step codes of each pipeline stage are named for convenience as shown in FIG. In addition, the step code performs processing related to the opcode, and the micro R
There are two series: a series that becomes an OM entry address or a parameter for the E stage 27, and a series that becomes an operand to be processed by the E stage 27. Furthermore, the program counter value of the instruction being processed is passed between the D stage 22 and the S stage 28, and the stack pointer value (and even the scoreboard register value) is passed between the A stage 23 and the E stage 27. .

【００３２】・　　命令フェッチステージ命令フェッチ
ステージ　（ＩＦステージ）　２１では命令フェッチ部
４７が動作する。図３はＦステージ２１とＤステージ２
２との関係を示すブロック図である。内蔵の命令キャッ
シュ７１や外部から命令をフェッチし、２つの命令キュ
ーＡ７２，同Ｂ７３　の一方に入力して、Ｄステージ２
２に対して２〜６バイト単位に命令コードを出力する。命令キュー７２，７３　の入力は、命令キャッシュ７１
がヒットしたときは整置された１６バイト単位、ミスし
た時は整置された４バイト単位で行う。命令キューＡ７
２　及び同Ｂ７３　は条件分岐命令に引き続く命令及ぶ
分岐先命令の両方をフェッチするため２つ存在する。標
準アクセスモードで外部から命令をフェッチするときは
整置された４バイトにつき最小２クロックを要する。バ
ーストモードでは１６バイトにつき最小５クロックを要
する。命令キャッシュ７１がヒットしたときは整置され
た１６バイトにつき１クロックで命令がフェッチされる
。命令キューＡ７２，Ｂ７３　の出力単位は２バイトご
とに可変であり、１クロックの間に最大６バイトまで出
力できる。命令の論理アドレスの物理アドレスへの変換
、命令キャッシュ７１や命令用ＴＬＢの制御、プリフェ
ッチ先命令アドレスの管理や命令キューＡ７２，Ｂ７３
　の制御もＩＦステージ２１で行う。- Instruction Fetch Stage In the instruction fetch stage (IF stage) 21, the instruction fetch section 47 operates. Figure 3 shows F stage 21 and D stage 2.
FIG. 2 is a block diagram showing the relationship between FIG. An instruction is fetched from the built-in instruction cache 71 or externally, inputted to one of the two instruction queues A72 and B73, and then executed at D stage 2.
2, the instruction code is output in units of 2 to 6 bytes. The inputs of the instruction queues 72 and 73 are input to the instruction cache 71.
If there is a hit, it is done in aligned 16-byte units, and if it is a miss, it is done in aligned 4-byte units. Instruction queue A7
2 and B73 exist in order to fetch both the instruction following the conditional branch instruction and the branch destination instruction. When fetching an instruction from the outside in standard access mode, a minimum of two clocks are required for every four aligned bytes. Burst mode requires a minimum of 5 clocks per 16 bytes. When the instruction cache 71 hits, an instruction is fetched in one clock for every 16 aligned bytes. The output unit of the instruction queues A72 and B73 is variable every two bytes, and a maximum of six bytes can be output during one clock. Conversion of logical addresses of instructions to physical addresses, control of the instruction cache 71 and instruction TLB, management of prefetch destination instruction addresses, and instruction queues A72 and B73
This control is also performed by the IF stage 21.

【００３３】・　　命令デコードステージ命令デコード
ステージ（Ｄステージ）２２はＩＦステージ２１から入
力された命令コードをデコードする。デコードは命令デ
コード部４０のＦＨＷ　デコーダ、ＮＦＨＷデコーダ、
アドレッシングモードデコーダ等の命令デコーダ７５を
使用して、１クロックに１度行い、１回のデコード処理
で、０〜６バイトの命令コードを消費する（リターンサ
ブルーチン命令の復帰先アドレスを含むステップコード
の出力処理などでは命令コードを消費しない）。１回の
デコードでＡステージ２３に対してアドレス計算情報で
あるＡコードとオペコードの中間デコード結果であるＤ
コードとを出力する。Ｄステージ２２では各命令のＰＣ
計算部４２の制御、命令キュー７２，７３　からの命令
コード出力処理も行う。Ｄステージ２２ではジャンプ命
令に対して先行ジャンプ処理　（Ｄステージ先行ジャン
プ）を行う。条件分岐命令を除き、先行ジャンプを行っ
たジャンプ命令に対してはＤコードやＡコードは出力せ
ず、Ｄステージ２２で命令の処理を終了する。- Instruction decode stage The instruction decode stage (D stage) 22 decodes the instruction code input from the IF stage 21. Decoding is performed by the FHW decoder and NFHW decoder of the instruction decoding section 40,
The instruction decoder 75 such as the addressing mode decoder is used to perform the decoding once per clock, and one decoding process consumes 0 to 6 bytes of instruction code (step code including the return address of the return subroutine instruction). No instruction code is consumed during output processing, etc.) In one decoding, the A code, which is address calculation information, and D, which is the intermediate decoding result of the operation code, are sent to the A stage 23.
Output the code. At the D stage 22, the PC of each instruction
It also controls the calculation unit 42 and outputs instruction codes from the instruction queues 72 and 73. In the D stage 22, advance jump processing (D stage advance jump) is performed in response to a jump command. Except for conditional branch instructions, no D code or A code is output for a jump instruction that performs a preceding jump, and instruction processing ends at the D stage 22.

【００３４】条件分岐命令をデコードしたとき、Ｄステ
ージ２２ではＩＦステージ２１に対して分岐先と非分岐
先との両方から命令をフェッチすることを指示する。条
件分岐命令に引き続いてデコードする命令は分岐予測の
結果に従って決定する。つまり、分岐すると予測される
条件分岐命令の次は分岐先の命令をフェッチする命令キ
ューＡ７２　から出力される命令をデコードし、分岐し
ないと予測される条件命令に対しては非分岐先命令をフ
ェッチする命令キューＢ７３　から出力される命令コー
ドをデコードする。When a conditional branch instruction is decoded, the D stage 22 instructs the IF stage 21 to fetch instructions from both the branch destination and non-branch destination. The instruction to be decoded following the conditional branch instruction is determined according to the result of branch prediction. In other words, after a conditional branch instruction that is predicted to branch, the instruction output from the instruction queue A72 is decoded to fetch the branch destination instruction, and for a conditional instruction predicted not to branch, a non-branch destination instruction is fetched. The instruction code output from the instruction queue B73 is decoded.

【００３５】・　　オペランドアドレス計算ステージオ
ペランドアドレス計算ステージ（Ａステージ）２３は処
理が大きく２つに分かれる。１つは命令デコード部４０
のデコーダ７５を使用して、オペコードの後段デコード
を行う処理であり、他方はオペランドアドレス計算部４
１でオペランドアドレスの計算を行う処理である。オペ
コードの後段デコード処理はＤコードを入力とし、レジ
スタやメモリの書き込み予約及びマイクロプログラムル
ーチンのエントリ番地とマイクロプログラムに対するパ
ラメータなどとを含むＲコードの出力を行う。なお、レ
ジスタやメモリの書き込み予約は、アドレス計算で参照
したレジスタやメモリの内容が、パイプライン上を先行
する命令で書き換えられ、誤ったアドレス計算が行われ
るのを防ぐためのものである。Operand Address Calculation Stage The operand address calculation stage (A stage) 23 is roughly divided into two processes. One is the instruction decoding section 40
This process uses the decoder 75 of the operand address calculation unit 4 to perform subsequent decoding of the operation code.
This is a process in which the operand address is calculated in step 1. The subsequent decoding process of the operation code takes the D code as input and outputs the R code, which includes register and memory write reservations, the entry address of the microprogram routine, parameters for the microprogram, and the like. The purpose of register and memory write reservation is to prevent the contents of the register or memory referenced in address calculation from being rewritten by a preceding instruction on the pipeline, resulting in incorrect address calculation.

【００３６】オペランドアドレス計算処理はＡコードを
入力し、Ａコードに従いオペランドアドレス計算部４１
でオペランドのアドレス計算を行い、その計算結果をＦ
コードとして出力する。また、ジャンプ命令に対しては
ジャンプ先アドレスの計算を行う。アドレス計算に伴う
レジスタの読み出し時に書き込み予約のチェックを行い
、先行命令がレジスタやメモリに書き込み処理を終了し
ていないため予約があることが指示されれば、先行命令
がＥステージ２７で書き込み処理を終了するまで待つ。Ａステージ２３ではＤステージ２２で先行ジャンプを行
わなかったジャンプ命令に対して先行ジャンプ処理　（
Ａステージ先行ジャンプ）を行う。レジスタ間接ジャン
プやメモリ間接ジャンプに対してはＡステージ先行ジャ
ンプが行われる。Ａステージ先行ジャンプを行った命令
に対してはＲコードやＦコードは出力せず、Ａステージ
２３で命令の処理を終了する。Operand address calculation processing is performed by inputting the A code, and according to the A code, the operand address calculation unit 41
Calculate the address of the operand with , and send the calculation result to F
Output as code. In addition, for a jump instruction, a jump destination address is calculated. A write reservation is checked when reading a register during address calculation, and if it is indicated that there is a reservation because the preceding instruction has not finished writing to the register or memory, the preceding instruction executes the write processing at the E stage 27. Wait until it finishes. In the A stage 23, advance jump processing is performed for jump instructions that did not perform a advance jump in the D stage 22 (
A stage advance jump). An A stage advance jump is performed for register indirect jumps and memory indirect jumps. No R code or F code is output for the instruction that performed the A stage advance jump, and the processing of the instruction ends at the A stage 23.

【００３７】・　　マイクロＲＯＭ　アクセスステージ
オペランドフェッチステージ（Ｆステージ）２６も処理
が大きく２つに分かれる。１つはマイクロＲＯＭ　のア
クセス処理であり、特にＲステージ２４と呼ぶ。他方は
オペランドプリフェッチ処理であり、特にＯＦステージ
２５と呼ぶ。Ｒステージ２４とＯＦステージ２５とは必ずしも同時に
動作するわけではなく、データキャッシュのミスやヒッ
ト、データＴＬＢ　のミスやヒットなどに依存して、動
作タイミングが異なる。Ｒステージ２４の処理であるマ
イクロＲＯＭ　アクセス処理はＲコードに対して次のＥ
ステージ２７での実行に使用する実行制御コードである
Ｅコードを作り出すためのマイクロＲＯＭ　アクセスと
マイクロ命令デコード処理とである。Micro ROM Access Stage The operand fetch stage (F stage) 26 is also roughly divided into two processes. One is access processing of the micro ROM, especially called the R stage 24. The other is operand prefetch processing, especially called OF stage 25. The R stage 24 and the OF stage 25 do not necessarily operate at the same time, and their operation timings differ depending on data cache misses and hits, data TLB misses and hits, and the like. The micro ROM access process, which is the process of the R stage 24, is the following E for the R code.
These are micro ROM access and micro instruction decoding processing to create an E code, which is an execution control code used for execution in stage 27.

【００３８】１つのＲコードに対する処理が２つ以上の
マイクロプログラムステップに分解される場合、ＩＲＯ
Ｍ部４３やＦＲＯＭ部４４がＥステージ２７で使用され
、次のＲコードがマイクロＲＯＭ　アクセス待ちになる
ことがある。Ｒコードに対するマイクロＲＯＭ　アクセ
スが行われるのはＥステージ２７でのマイクロＲＯＭ　
アクセスが行われないときである。このマイクロプロセ
ッサでは多くの整数演算命令が１マイクロプログラムス
テップで行われ、多くの浮動小数点演算命令が２マイク
ロプログラムステップで行われるため実際にはＲコード
に対するマイクロＲＯＭ　アクセスが次々と行われるこ
とが多い。Ｒステージ２４の処理では、浮動小数点演算
部４６を使用しない命令に対してＩＲＯＭ部４３のみが
アクセスされ、ＦＲＯＭ部４４はアクセスされない。浮
動小数点演算部４６を使用する命令　（浮動小数点演算
命令や整数乗除算命令など）　に対してはＩＲＯＭ部４
３とＦＲＯＭ部４４とが共にアクセスされる。When processing for one R code is decomposed into two or more microprogram steps, IRO
The M section 43 and FROM section 44 may be used in the E stage 27, and the next R code may be waiting for access to the micro ROM. The micro ROM for the R code is accessed in the micro ROM at the E stage 27.
This is when no access is made. In this microprocessor, many integer arithmetic instructions are executed in one microprogram step, and many floating point arithmetic instructions are executed in two microprogram steps, so in reality, microROM accesses to the R code are often performed one after another. . In the processing of the R stage 24, only the IROM section 43 is accessed for an instruction that does not use the floating point arithmetic section 46, and the FROM section 44 is not accessed. For instructions that use the floating point arithmetic unit 46 (floating point arithmetic instructions, integer multiplication/division instructions, etc.), the IROM unit 4
3 and FROM section 44 are both accessed.

【００３９】・　　オペランドフェッチステージオペラ
ンドフェッチステージ（ＯＦステージ）　２５はＦステ
ージ２６で行う上記の２つの処理のうちオペランドプリ
フェッチ処理を行う。オペランドフェッチステージ２５
ではＦコードの論理アドレスをデータＴＬＢ　で物理ア
ドレスに変換してその物理アドレスで内蔵データキャッ
シュをアクセスしてオペランドをフェッチし、そのオペ
ランドとＦコードとして転送されてきたその論理アドレ
スとを組み合わせて、Ｓコードとして出力する。１つの
Ｆコードでは８バイト境界をクロスしてもよいが、８バ
イト以下のオペランドフェッチを指定する。Ｆコードに
はオペランドのアクセスを行うかどうかの指定も含まれ
ており、Ａステージ２３で計算したオペランドアドレス
自体や即値をＥステージ２７に転送する場合にはオペラ
ンドプリフェッチは行わず、Ｆコードの内容をＳコード
として転送する。- Operand Fetch Stage The operand fetch stage (OF stage) 25 performs operand prefetch processing of the above two processes performed in the F stage 26. Operand fetch stage 25
Now, convert the logical address of the F code to a physical address using the data TLB, access the built-in data cache with that physical address, fetch the operand, and combine that operand with the logical address transferred as the F code. Output as S code. One F code may cross an 8-byte boundary, but specifies an operand fetch of 8 bytes or less. The F code also includes a specification of whether or not to access the operand, and when transferring the operand address itself or immediate value calculated in the A stage 23 to the E stage 27, operand prefetch is not performed and the contents of the F code are is transferred as an S code.

【００４０】・　　実行ステージ実行ステージ（Ｅステージ）２７はＥコード、Ｓコード
を入力として動作する。このＥステージ２７が命令を実
行するステージであり、Ｆステージ２６以前のステージ
で行われた処理はすべてＥステージ２７のための前処理
である。Ｅステージ２７でジャンプが実行されたり、ＥＩＴ　処
理が起動されたりしたときは、ＩＦステージ２１〜Ｆス
テージ２７までの処理はすべて無効化される。Ｅステー
ジ２７はマイクロプログラムにより制御され、Ｒコード
に示されたマイクロプログラムルーチンのエントリ番地
からの一連のマイクロ命令を実行することにより命令を
実行する。- Execution stage The execution stage (E stage) 27 operates with E code and S code as input. This E stage 27 is a stage for executing instructions, and all processes performed in stages before the F stage 26 are preprocessing for the E stage 27. When a jump is executed or EIT processing is started at E stage 27, all processing from IF stage 21 to F stage 27 is invalidated. The E stage 27 is controlled by the microprogram and executes instructions by executing a series of microinstructions starting from the entry address of the microprogram routine indicated in the R code.

【００４１】Ｅコードには整数演算部４５を制御するコ
ード　（特にＥＩコードと呼ぶ）　と、浮動小数点演算
部４６を制御するコード（特にＥＦコードと呼ぶ）　と
があり、ＥＩコードとＥＦコードとは独立に出力するこ
とが可能であり、このときＥステージ２７では整数演算
部４５と浮動小数点演算部４６とが並列に動作する。例
えば浮動小数点演算部４６でメモリオペランドを持たな
い浮動小数点演算命令を実行する場合、浮動小数点演算
部４６は第２マイクロステップの動作から整数演算部４
５と切り放され、整数演算部４５と独立して並行動作す
る。なお、浮動小数点演算命令を含む全命令で整数演算
部４５は最小１マイクロ命令を実行する（浮動小数点演
算命令でも最初の１クロックは必ず整数演算部４５が動
作する）　。整数演算でも浮動小数点演算でもマイクロ
ＲＯＭの読み出しとマイクロ命令の実行とはパイプライ
ン化されて行われる。したがってマイクロプログラムで
分岐が起きたときは１マイクロステップの空きができる
。Ｅステージ２７ではＡステージ２３で行ったレジスタ
やメモリに対する書き込み予約をオペランドの書き込み
の後、解除する。The E code includes a code for controlling the integer calculation section 45 (especially called the EI code) and a code for controlling the floating point calculation section 46 (especially called the EF code). can be output independently, and at this time, in the E stage 27, the integer arithmetic unit 45 and the floating point arithmetic unit 46 operate in parallel. For example, when the floating point arithmetic unit 46 executes a floating point arithmetic instruction that does not have a memory operand, the floating point arithmetic unit 46 starts from the operation of the second microstep to the integer arithmetic unit 4.
5 and operates independently and in parallel with the integer calculation section 45. Note that for all instructions including floating point arithmetic instructions, the integer arithmetic unit 45 executes at least one microinstruction (even for floating point arithmetic instructions, the integer arithmetic unit 45 always operates during the first clock). In both integer arithmetic and floating point arithmetic, micro ROM reading and microinstruction execution are performed in a pipelined manner. Therefore, when a branch occurs in a microprogram, one microstep becomes available. In the E stage 27, the write reservation for registers and memory made in the A stage 23 is canceled after the operand is written.

【００４２】各種の割込は命令の切れ目でＥステージ２
７で直接受け付けられ、マイクロプログラムにより必要
な処理が実行される。その他の各種ＥＩＴ　の処理もＥ
ステージ２７でマイクロプログラムにより行われる。演
算の結果をメモリにストアする必要があるときはＥステ
ージ２７はＳステージ２８へＷコードとストア処理を行
う命令のプログラムカウンタ値との２つを出力する。メ
モリへのオペランドストアは整数演算の結果と浮動小数
点演算の結果とにかかわらず、プログラムで論理的に指
定された順序で行われる。浮動小数点演算部４６からメ
モリへデータをストアするとき、整数演算部４５はその
命令が終了するまで　（ストアステージに移るまで）　
引き続くすべての命令を実行しない。[0042] Various interrupts are processed at E stage 2 at the end of instructions.
7, and the necessary processing is executed by the microprogram. Processing of other various EITs is also
This is done in stage 27 by a microprogram. When it is necessary to store the result of the operation in the memory, the E stage 27 outputs the W code and the program counter value of the instruction to perform the store processing to the S stage 28. Operands are stored in memory in the order logically specified by the program, regardless of the results of integer operations or floating-point operations. When storing data from the floating point arithmetic unit 46 to the memory, the integer arithmetic unit 45 continues until the instruction is completed (until moving to the store stage).
Do not execute all subsequent commands.

【００４３】・　　オペランドストアステージオペラン
ドストアステージ（Ｓステージ）２８はＷコードの論理
アドレスをデータＴＬＢ　で物理アドレスに変換し、そ
のアドレスでＷコードのデータを内蔵データキャッシュ
にストアする。同時にＷコードとプログラムカウンタ値
とをストアバッファに入力し、データＴＬＢ　から出力
された物理アドレスを用いて外部のメモリへＷコードの
データをストアする処理を行う。オペランドストアステ
ージ２８の動作はオペランドアクセス部４８で行われ、
データＴＬＢ　や内蔵データキャッシュがミスしたとき
のアドレス変換処理や内蔵データキャッシュの入れ替え
処理も行う。オペランドのストア処理でＥＩＴ　を検出
した場合はストアバッファにＷコードとプログラムカウ
ンタ値とを保持したまま、Ｅステージ２７にＥＩＴ　を
通知する。Operand Store Stage The operand store stage (S stage) 28 converts the logical address of the W code into a physical address using the data TLB, and stores the data of the W code in the built-in data cache at that address. At the same time, the W code and program counter value are input to the store buffer, and the W code data is stored in an external memory using the physical address output from the data TLB. The operation of the operand store stage 28 is performed by the operand access unit 48,
It also performs address conversion processing and internal data cache replacement processing when the data TLB or built-in data cache misses. If EIT is detected during operand store processing, EIT is notified to the E stage 27 while the W code and program counter value are held in the store buffer.

【００４４】・　　パイプラインステージの状態制御パ
イプラインの各ステージは入力ラッチと出力ラッチとを
持ち、他のステージとは独立に動作することを基本とす
る。各ステージは１つ前に行った処理が終わり、その処
理結果を出力ラッチから次のステージの入力ラッチに転
送し、自分のステージの入力ラッチに次の処理に必要な
入力信号がすべてそろえば次の処理を開始する。つまり
、各ステージは、１つ前段のステージから出力されてく
る次の処理に対する入力信号がすべて有効となり、今の
処理結果を後段のステージの入力ラッチに転送して出力
ラッチが空になると次の処理を開始する。各ステージが
動作を開始する直前のタイミングで入力信号がすべてそ
ろっている必要がある。入力信号がそろっていないと、
そのステージは待ち状態（入力待ち）になる。出力ラッ
チから次のステージの入力ラッチへの転送を行うときは
、次のステージの入力ラッチが空き状態になっている必
要があり、次のステージの入力ラッチが空きでない場合
もパイプラインステージは待ち状態（出力待ち）になる
。また、キャッシュやＴＬＢ　がミスしたり、パイプラ
インで処理中の命令間にデータ干渉が生じると、１つの
ステージの処理に複数クロック必要となり、パイプライ
ン処理が遅延する。- State control of pipeline stages Each stage of the pipeline has an input latch and an output latch, and basically operates independently of other stages. At each stage, when the previous processing is completed, the processing result is transferred from the output latch to the input latch of the next stage, and when the input latch of the own stage has all the input signals necessary for the next processing, the next stage is started. start processing. In other words, in each stage, all input signals for the next process output from the previous stage are valid, and when the current processing result is transferred to the input latch of the subsequent stage and the output latch becomes empty, the next Start processing. All input signals must be available at the timing immediately before each stage starts operating. If the input signals are not aligned,
The stage enters a waiting state (waiting for input). When performing a transfer from the output latch to the input latch of the next stage, the input latch of the next stage must be in the free state, and even if the input latch of the next stage is not free, the pipeline stage will wait. state (waiting for output). Furthermore, if a cache or TLB misses or data interference occurs between instructions being processed in the pipeline, multiple clocks are required to process one stage, causing a delay in pipeline processing.

【００４５】このマイクロプロセッサでは上記に示した
基本的なパイプライン処理の他に、レジスタ間の整数演
算命令が２つ連続する場合など一部の場合に２命令を同
時デコード同時実行するスーパースケーラアーキテクチ
ャをとる。このマイクロプロセッサはこの機能を実現す
るために、命令デコード部４０には主命令デコーダと副
命令デコーダとがあり、命令実行部には主演算器と副演
算器がある。整数演算部４５のレジスタと２つの演算器
とは夫々が３本のバスで結合されている。図４はマイク
ロプロセッサの整数演算部４５のレジスタと演算器との
バス結合の関係、さらに整数演算部４５と浮動小数点演
算部４６との接続関係を示すブロック図である。Ｓ１，
Ｓ２，Ｓ３，Ｓ４，Ｄ１，Ｄ３，ＩＸの各バスは３２ビ
ットであり、ＤＤバスは６４ビットである。主演算器９
３はレジスタファイル９２からＳ１バスとＳ２バスとで
データをフェッチし、Ｄ１バスで演算結果を書きもどす
。副演算器９１はレジスタファイル９２からＳ３バスと
Ｓ４バスとでデータをフェッチし、Ｄ３バスで演算結果
を書きもどす。Ｓ１バスとＳ２バス、Ｄ１バスとＤ３バ
スは整数演算部４５と浮動小数点演算部（ＦＰＵ）　４
６とでデータを通信するときは２つが連結して６４ビッ
トのバスとして動作する。In addition to the basic pipeline processing shown above, this microprocessor uses a superscalar architecture that decodes and simultaneously executes two instructions in some cases, such as when two integer operation instructions between registers are consecutive. Take. In order to realize this function, this microprocessor has an instruction decoding section 40 including a main instruction decoder and a sub-instruction decoder, and an instruction execution section including a main arithmetic unit and a sub-instruction unit. The register of the integer arithmetic unit 45 and the two arithmetic units are each coupled by three buses. FIG. 4 is a block diagram showing the bus connection relationship between the registers and the arithmetic units of the integer arithmetic unit 45 of the microprocessor, and the connection relationship between the integer arithmetic unit 45 and the floating point arithmetic unit 46. S1,
Each of the S2, S3, S4, D1, D3, and IX buses is 32 bits, and the DD bus is 64 bits. Main processor 9
3 fetches data from the register file 92 via the S1 bus and the S2 bus, and writes back the operation result via the D1 bus. The sub-operation unit 91 fetches data from the register file 92 via the S3 bus and the S4 bus, and writes back the operation result via the D3 bus. The S1 bus, the S2 bus, the D1 bus, and the D3 bus are an integer operation unit 45 and a floating point operation unit (FPU) 4
When communicating data with 6, the two are connected and operate as a 64-bit bus.

【００４６】次にマイクロプロセッサのデータキャッシ
ュに関わるブロックについて詳細に説明する。図５はマ
イクロプロセッサのオペランドアクセス部（以下ＯＡＵ
　と称する）　４８の構成を示すブロック図である。Ｏ
ＡＵ　４８はオペランドアドレス計算ステージ２５とＥ
ステージ２７とからアクセス要求を受け付け、処理を開
始する。オペランドアドレス計算部４１からＯＡＵ　４
８に送られるアクセス要求は２種類あって、１つはオペ
ランドフェッチ要求、もう１つはアドレス生成のための
外部メモリの間接参照である。またＥステージ２７から
は、オペランドのフェッチとストアの要求とがくる。こ
の場合、アドレスはＡＡレジスタ５９に格納され、そこ
からアドレス変換部５４に送られる。Next, blocks related to the data cache of the microprocessor will be explained in detail. Figure 5 shows the operand access unit (hereinafter referred to as OAU) of the microprocessor.
48 is a block diagram showing the configuration of a computer. O
AU 48 is operand address calculation stage 25 and E
The access request is received from stage 27 and processing is started. OAU 4 from operand address calculation unit 41
There are two types of access requests sent to 8: one is an operand fetch request, and the other is an indirect reference to external memory for address generation. Also, requests for fetching and storing operands come from the E stage 27. In this case, the address is stored in the AA register 59 and sent from there to the address converter 54.

【００４７】アドレス計算ステージからアクセス要求を
受け付けたときのＯＡＵ　４８の処理シーケンスについ
て説明する。ＯＡＵ　４８はアドレス計算ステージ２３
からのアクセス要求を検出すると、ＯＡＵ　４８が前の
処理中であってもオペランドフェッチ用のアドレスを蓄
えるバッファであるＦステージ２６が受付可能ならば論
理アドレスを取り込む。そして前の処理が終了した後Ｆステージ２６から要求を
受け付ける。このようにしてアドレス変換部５４に送ら
れた論理アドレスはまずＴＬＢ　５５の参照を行い、Ｔ
ＬＢがヒットして物理アドレスが生成された場合、その
物理アドレスはデータキャッシュ５６に出力される。Ｔ
ＬＢ　５５がミスした場合は、アドレス変換部５４にお
いて外部メモリ上のアドレス変換テーブルの参照のため
の外部バスアクセス要求がバスインターフェース部５０
に出力される。そしてアドレス変換テーブルの参照が終
了し、生成された物理アドレスはデータキャッシュ５６
に出力される。データキャッシュ５６がヒットすれば、
データキャッシュ５６から読み出されたデータがオペラ
ンドプリフェッチキューであるＳコードレジスタ６０に
登録される。Ｅステージ２７はＳコードレジスタ６０か
らオペランドを読み込み、処理を行う。キャッシュミスしたときはバスインターフェース部５０
にミスしたブロックの登録要求を送り、外部から取り込
んだデータをデータキャッシュ５６に入力すると共にＳ
コードレジスタ６０にも入力する。The processing sequence of the OAU 48 when receiving an access request from the address calculation stage will be described. OAU 48 is address calculation stage 23
When the OAU 48 detects an access request from the F stage 26, which is a buffer for storing addresses for operand fetching, even if the OAU 48 is in the middle of a previous process, the logical address is taken in if the F stage 26 is available. Then, after the previous process is completed, a request is accepted from the F stage 26. The logical address sent to the address translation unit 54 in this way first refers to the TLB 55, and then
If the LB is hit and a physical address is generated, the physical address is output to the data cache 56. T
If the LB 55 misses, the address translation unit 54 sends an external bus access request to the address translation table on the external memory to the bus interface unit 50.
is output to. Then, the reference to the address translation table is completed, and the generated physical address is stored in the data cache 56.
is output to. If the data cache 56 is hit,
Data read from the data cache 56 is registered in the S code register 60, which is an operand prefetch queue. The E stage 27 reads the operand from the S code register 60 and processes it. When a cache miss occurs, the bus interface unit 50
Sends a registration request for the missed block to S, inputs the data imported from the outside into the data cache 56, and
It is also input to the code register 60.

【００４８】Ｅステージ２７からストア要求が送られた
場合の処理シーケンスを説明する。Ｅステージ２７はス
トアデータをＤＤレジスタ５８に、ストアアドレスをＡ
Ａレジスタ５９に書き込み、アクセス要求を出力する。そのアクセス要求がＯＡＵ　４８で受け付けられると、
上記アドレス計算ステージ２９からのアクセス受付後の
処理シーケンス同様、アドレス変換部５４、ＴＬＢ　５
５、データキャッシュ５６を動作させる。ただし、スト
ア処理の場合はデータキャッシュ５６がヒットしたとき
は、次にストアデータをデータキャッシュ５６に書き込
まなければならない。データキャッシュ５６がミスした
ときには何も行わない。また、マイクロプロセッサのデ
ータキャッシュ５６はライトスルー方式を用いており、
外部メモリにもデータを書き込む必要がある。一般に外
部バスサイクルはマイクロプロセッサのマシンサイクル
よりも遅いので、ストアバッファ５７を設けて内部処理
が外部バスアクセスによって律速されるのを防ぐ。スト
アバッファ５７は３エントリあり、Ｅステージ２７から
送られたストアデータがワード境界をまたぐような場合
でも１回でストアバッファ５７に登録できるようになっ
ている。The processing sequence when a store request is sent from the E stage 27 will be explained. The E stage 27 stores store data in the DD register 58 and stores the store address in A.
Write to the A register 59 and output an access request. When the access request is accepted by OAU 48,
Similar to the processing sequence after receiving access from the address calculation stage 29, the address conversion unit 54 and the TLB 5
5. Operate the data cache 56. However, in the case of store processing, when the data cache 56 is hit, store data must be written to the data cache 56 next. When data cache 56 misses, nothing is done. In addition, the data cache 56 of the microprocessor uses a write-through method,
It is also necessary to write data to external memory. Since external bus cycles are generally slower than microprocessor machine cycles, store buffer 57 is provided to prevent internal processing from being rate-limited by external bus accesses. The store buffer 57 has three entries, so that even if the store data sent from the E stage 27 straddles a word boundary, it can be registered in the store buffer 57 in one go.

【００４９】図６は本発明のデータキャッシュ５６の構
成を示すブロック図である。このデータキャッシュ５６
は０〜３ウェイの４ウェイセットアソシアティブ方式で
動作し、各ウェイのエントリ数は６４である。データキ
ャッシュ５６はアドレスバスを介して与えられたアドレ
スを一時的に格納する第１アドレスレジスタ１０１　、
アドレスの一部　（下位６ビット）を一時的に格納する
第２アドレスレジスタ１０２　、第１のアドレスレジス
タ１０１　と第２のアドレスレジスタ１０２　とを結ぶ
転送経路１１２　、アドレスの下位６ビットをデコード
してタグメモリ１０４　のエントリを選択するタグエン
トリデコーダ１０３　、タグメモリ１０４　、データメ
モリ１０６　のエントリを選択するデータエントリデコ
ーダ１０５　、データメモリ１０６　、比較器１０７　
及びゲート１０８　より構成される。FIG. 6 is a block diagram showing the configuration of the data cache 56 of the present invention. This data cache 56
operates in a 4-way set associative system with 0 to 3 ways, and the number of entries for each way is 64. The data cache 56 includes a first address register 101 that temporarily stores an address given via the address bus;
A second address register 102 that temporarily stores part of the address (lower 6 bits), a transfer path 112 that connects the first address register 101 and the second address register 102, and a transfer path 112 that decodes the lower 6 bits of the address. A tag entry decoder 103 that selects an entry in the tag memory 104 , a data entry decoder 105 that selects an entry in the data memory 106 , a data memory 106 , and a comparator 107
and a gate 108.

【００５０】命令を処理した結果、外部記憶装置１１０
　からデータを読み込む処理を行う場合、読み込むべき
データのアドレスはアドレスバス１００　を通ってデー
タキャッシュ５６に送られ、タグメモリ１０４　用の第
１アドレスレジスタ１０１　には全ビット幅　（３２ビ
ット）　のアドレスが、データ用の第２アドレスレジス
タ１０２　には下位の６ビットのアドレスが格納される
。そして第１及び第２アドレスレジスタ１０１，１０２
　の下位６ビットのアドレスがタグメモリ１０４　用の
タグエントリデコーダ１０３　とデータメモリ１０６　
用のデータエントリデコーダ１０５　に夫々送られ、６
４のエントリのいずれかを選択する信号を生成する。次
にタグメモリ１０４　では選択されたエントリから４つ
のタグ情報を読み出し、比較器１０７　で４つのタグ情
報と、第１アドレスレジスタ１０１　の前記下位６ビッ
トを除く上位ビットとの比較を行い、一致したウェイに
対応するデータメモリ１０６　から読み出されたデータ
をゲート１０８　で選択して、そのデータを演算装置へ
出力する。このようにデータキャッシュ５６からデータ
を読み込む場合は、タグ及びデータエントリデコーダ１
０３，１０５　は同一の動作をする。As a result of processing the command, the external storage device 110
When performing the process of reading data from , the address of the data to be read is sent to the data cache 56 via the address bus 100, and the full bit width (32 bits) address is stored in the first address register 101 for the tag memory 104. , the lower 6-bit address is stored in the second address register 102 for data. and first and second address registers 101, 102
The address of the lower 6 bits is the tag entry decoder 103 for the tag memory 104 and the data memory 106.
are sent to data entry decoders 105 for 6, respectively.
A signal is generated to select one of the four entries. Next, the tag memory 104 reads the four tag information from the selected entry, and the comparator 107 compares the four tag information with the upper bits of the first address register 101 excluding the lower 6 bits. The gate 108 selects the data read from the data memory 106 corresponding to the way, and outputs the data to the arithmetic unit. When reading data from the data cache 56 in this way, the tag and data entry decoder 1
03 and 105 perform the same operation.

【００５１】次に命令を処理した結果、外部記憶装置１
１０　にデータを書き込む処理を行う場合について説明
する。書き込み処理の場合はアドレスバス１００　を通
して送られてくるアドレスは第１アドレスレジスタ１０
１にのみ取り込まれる。そして読み込み処理と同様タグ
メモリ１０４　の検索を行う。タグメモリ１０４　から
読み出したデータと第１アドレスレジスタ１０１　の上
位ビットが一致すれば、第１アドレスレジスタ１０１　
の下位６ビットを第２アドレスレジスタ１０２　に転送
するとともに、もし、次にまた書き込み処理のアドレス
が送られて来たならば、そのアドレスを第１アドレスレ
ジスタ１０１　に取り込む。次にデータメモリ１０６　
では、転送された下位アドレスをデータエントリデコー
ダ１０５　でデコードして６４エントリのいずれかを選
択する信号を生成する。そして、タグメモリ１０４　の
一致したウェイ情報にもとづき対応するデータメモリ１
０６　に内部データバス１１１　からデータを書き込む
。一方タグメモリ１０４　は次の書き込み処理に対応する
アドレスの検索を行い、一致すれば同様に第１アドレス
レジスタ１０１　の下位６ビットを第２アドレスレジス
タ１０２　に転送する。Next, as a result of processing the command, the external storage device 1
10 will be described. In the case of write processing, the address sent through the address bus 100 is stored in the first address register 10.
1 only. Then, similar to the reading process, the tag memory 104 is searched. If the data read from the tag memory 104 and the upper bits of the first address register 101 match, the first address register 101
The lower 6 bits of the address are transferred to the second address register 102, and if an address for write processing is sent again next time, that address is taken into the first address register 101. Next, data memory 106
Then, the transferred lower address is decoded by the data entry decoder 105 to generate a signal for selecting one of the 64 entries. Then, based on the matched way information in the tag memory 104, the corresponding data memory 1
06, data is written from the internal data bus 111. On the other hand, the tag memory 104 searches for an address corresponding to the next write process, and if they match, similarly transfers the lower 6 bits of the first address register 101 to the second address register 102.

【００５２】図７は本発明のデータキャッシュ５６を内
蔵するマイクロプロセッサのタイミングチャートを示す
。図７は１回目がリードであり、２〜４回目がライトであ
る場合を示している。まず１回目はリードであるので、
そのアドレスはタグメモリ１０４　用の第１アドレスレ
ジスタ１０１　とデータメモリ１０６　用の第２アドレ
スレジスタ１０２　とに同時に書き込まれる。そして次
にライト処理が来たのでタグメモリ１０４用の第１アド
レスレジスタ１０１　のみにアドレスを書き込み、デー
タメモリ１０６　用の第２アドレスレジスタ１０２　に
は書き込まない。そしてライトのアドレスがタグ情報と
一致　（キャッシュヒット）　したなら第２アドレスレ
ジスタ１０２　にアドレスを転送する。その際、キャッ
シュミスであればデータキャッシュ５６のデータメモリ
１０６　に書き込む必要が無いので、次のリード処理の
アドレスを第１アドレスレジスタ１０１　と第２アドレ
スレジスタ１０２　とに取り込むようにすることもでき
る。このようにすることによってキャッシュヒットおよ
びミス時のライトの連続に対しても、キャッシュミス時
のライトに連続するリードに対しても１クロックでキャ
ッシュアクセスをオーバラップさせることができる。FIG. 7 shows a timing chart of a microprocessor incorporating the data cache 56 of the present invention. FIG. 7 shows a case where the first time is a read and the second to fourth times are writes. First of all, the first time is a lead, so
The address is simultaneously written into the first address register 101 for the tag memory 104 and the second address register 102 for the data memory 106. Then, since write processing has come, the address is written only to the first address register 101 for the tag memory 104, but not to the second address register 102 for the data memory 106. If the write address matches the tag information (cache hit), the address is transferred to the second address register 102. At this time, since there is no need to write to the data memory 106 of the data cache 56 in the case of a cache miss, the address for the next read process may be taken into the first address register 101 and the second address register 102. By doing so, cache accesses can be overlapped in one clock both for successive writes in the case of cache hits and misses, and for reads subsequent to writes in the case of cache misses.

【００５３】またタグメモリ１０４　用とデータメモリ
１０６　用とに独立してエントリデコーダ１０３，１０
５　があるので、バススヌープ動作もキャッシュライト
と並行して行うことができる。マイクロプロセッサが外
部バス上で他のマイクロプロセッサと接続され、共通の
外部記憶装置　（主記憶）　をアクセスするような場合
、夫々のデータキャッシュ５６の一貫性を保つ必要があ
り、他のマイクロプロセッサが主記憶を更新する場合は
、そこの領域を持っているデータキャッシュ５６のブロ
ックを無効化しなければならない。そのため、タグメモ
リ１０４　用の第１アドレスレジスタ１０１　は内部か
らアドレスを入力するとともに外部アドレスをモニター
して、取り込む機能を持たなければならない。このマイ
クロプロセッサでは、そのような場合においてバススヌ
ープのために第１アドレスレジスタ１０１　とタグエン
トリデコーダ１０３とが使用されていても、データメモ
リ１０６　にもう１つ第２アドレスレジスタ１０２とデ
ータエントリデコーダ１０５　とを備えているので、ラ
イトのためのデータメモリ１０６　への書き込みは同時
に実行することができる。Entry decoders 103 and 10 are provided independently for the tag memory 104 and the data memory 106.
5, bus snoop operations can also be performed in parallel with cache writes. When a microprocessor is connected to other microprocessors on an external bus and accesses a common external storage device (main memory), it is necessary to maintain the consistency of each data cache 56, so that other microprocessors When updating the main memory, the blocks in the data cache 56 that have that area must be invalidated. Therefore, the first address register 101 for the tag memory 104 must have the function of inputting an address from inside, as well as monitoring and fetching an external address. In this microprocessor, even if the first address register 101 and the tag entry decoder 103 are used for bus snooping in such a case, the second address register 102 and the data entry decoder 105 are also provided in the data memory 106. Therefore, writing to the data memory 106 can be executed simultaneously.

【００５４】[0054]

【発明の効果】以上説明したとおり、本発明においては
アドレスレジスタ及びエントリデコーダをタグ用とデー
タ用とに各別に設け、ライト動作時に第１のアドレスレ
ジスタの格納アドレスの一部を第２のアドレスレジスタ
に転送できるようにするようにしたので、タグ比較とデ
ータ書き込み動作とを並列処理　（オーバラップ）　す
ることができ、ライト動作を連続的に処理できるように
なる。従ってライト動作が連続した場合、それらを１ク
ロックで処理でき、マイクロプロセッサの演算処理と同
一時間で遅延なし処理で行え、キャッシュメモリ及びマ
イクロプロセッサを高速動作させることができるという
効果がある。As explained above, in the present invention, address registers and entry decoders are provided separately for tags and data, and a part of the storage address of the first address register is transferred to the second address during a write operation. Since data can be transferred to a register, tag comparison and data write operations can be processed in parallel (overlap), and write operations can be processed continuously. Therefore, when write operations are continuous, they can be processed in one clock, in the same time as the arithmetic processing of the microprocessor, and without delay, and the cache memory and microprocessor can operate at high speed.

[Brief explanation of the drawing]

【図１】本発明に係るキャッシュメモリを内蔵したマイ
クロプロセッサの構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a microprocessor incorporating a cache memory according to the present invention.

【図２】マイクロプロセッサのパイプライン機構のブロ
ック図である。FIG. 2 is a block diagram of the pipeline mechanism of a microprocessor.

【図３】ＦステージとＤステージとの関係を示すブロッ
ク図である。FIG. 3 is a block diagram showing the relationship between an F stage and a D stage.

【図４】整数演算部と浮動小数点演算部との接続関係を
示すブロック図である。FIG. 4 is a block diagram showing a connection relationship between an integer calculation section and a floating point calculation section.

【図５】オペランドアクセス部の構成を示すブロック図
である。FIG. 5 is a block diagram showing the configuration of an operand access unit.

【図６】本発明のキャッシュメモリであるデータキャッ
シュの構成を示すブロック図である。FIG. 6 is a block diagram showing the configuration of a data cache which is a cache memory of the present invention.

【図７】本発明のデータキャッシュを内蔵したマイクロ
プロセッサのデータキャッシュのアクセス動作を示すタ
イミングチャートである。FIG. 7 is a timing chart showing a data cache access operation of a microprocessor incorporating a data cache according to the present invention.

【図８】従来のキャッシュメモリの構成を示すブロック
図である。FIG. 8 is a block diagram showing the configuration of a conventional cache memory.

【図９】従来のキャッシュメモリのアクセス動作のタイ
ミングチャートである。FIG. 9 is a timing chart of a conventional cache memory access operation.

[Explanation of symbols]

１０１　　　第１アドレスレジスタ１０２　　　第２アドレスレジスタ１０３　　　タグエントリデコーダ１０４　　　タグメモリ１０５　　　データエントリデコーダ１０６　　　データメモリ１０７　　　比較器１１０　　　外部記憶装置 101 First address register 102 Second address register 103 Tag entry decoder 104 Tag memory 105 Data entry decoder 106 Data memory 107 Comparator 110 External storage device

Claims

[Claims]

1. A data memory that stores information stored in a main memory, a tag memory that stores tag information related to a storage address of the information, and a part of the storage address that is decoded to generate an entry. an entry decoder; and a comparator that compares the tag information stored in the generated entry of the tag memory with part or all of the remaining storage address, and an instruction that accesses the main memory is connected thereto. When the cache memory device reads and writes the data memory according to the comparison result of the comparator, the cache memory device includes a first address register that takes in the storage address, and a part or part of the storage address. A second address register that captures all of the data, a path that transfers part of the storage address captured by the first address register to the second address register, and a decode part of the storage address captured by the first address register. and a second entry decoder that selects an entry of the data memory by decoding a part of the storage address taken in by the second address register. , when the instruction is an instruction to write information to the main memory, only the first address register takes in the storage address, and the first address register takes in the storage address and the first
A cache memory device characterized in that part or all of the storage address taken by the second address register is transferred to the second address register via the path.