JP2023133850A

JP2023133850A - Arithmetic processing device and processing method

Info

Publication number: JP2023133850A
Application number: JP2022039071A
Authority: JP
Inventors: 正裕五島; Masahiro Goshima; 毅葛; Ge Yi
Original assignee: Fujitsu Ltd; Research Organization of Information and Systems
Current assignee: Fujitsu Ltd; Research Organization of Information and Systems
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2023-09-27
Also published as: US12061540B2; US20230289284A1

Abstract

To reduce a circuit scale of a conflict determination unit that determines a conflict between an address included in a memory access instruction and an address held in a queue that holds the memory access instruction.SOLUTION: An arithmetic processor includes: a queue configured to hold a memory access instruction including one or more addresses; a contracted address generation unit configured to generate a contracted address by contracting bits of multiple addresses in a case where the memory access instruction includes the multiple addresses; a conflict determination unit configured to determine a conflict between the contracted address and the address held in the queue; and an access control unit configured to control processes of the memory access instruction held in the queue, based on a determination result of the conflict detection unit.SELECTED DRAWING: Figure 1

Description

本発明は、演算処理装置および演算処理方法に関する。 The present invention relates to an arithmetic processing device and an arithmetic processing method.

ＳＩＭＤ（Single Instruction Multiple Data）演算機能を有する演算処理装置では、複数のデータの演算を並列に実行することで処理性能が向上する。例えば、複数のデータの演算に使用する複数のデータはベクトルロード命令を使用してメモリから並列に読み出される。すなわち、ＳＩＭＤ演算機能を有する演算処理装置は、データ転送を効率化するアーキテクチャを有する。 In an arithmetic processing device having a SIMD (Single Instruction Multiple Data) arithmetic function, processing performance is improved by executing arithmetic operations on a plurality of data in parallel. For example, multiple pieces of data used in multiple data operations are read from memory in parallel using vector load instructions. That is, an arithmetic processing device having a SIMD arithmetic function has an architecture that makes data transfer more efficient.

例えば、この種の演算処理装置において、ベクトル演算の実行時にアドレスハザード状態のメモリアドレスが存在するかを決定するチェック命令を実行することで、アドレスの衝突を管理する手法が知られている（例えば、特許文献１参照）。また、ベクトルギャザー命令の実行において、１つのライン内のアドレスの重複数を求めてリクエストを統合し、複数のライン間のアドレスの重複数の積算値をスカラ演算部に通知する手法が知られている（例えば、特許文献２参照）。さらに、領域指定付きのベクトルスキャッター命令のアドレス範囲と後続のメモリアクセス命令のアドレスとの重複を検出した場合、後続のメモリアクセス命令をホールドする手法が知られている（例えば、特許文献３参照）。 For example, in this type of arithmetic processing device, there is a known method for managing address collisions by executing a check instruction to determine whether a memory address in an address hazard state exists when a vector operation is executed (e.g. , see Patent Document 1). In addition, when executing a vector gather instruction, there is a known method of calculating the number of duplicate addresses in one line, integrating the requests, and notifying the scalar operation unit of the integrated value of the number of duplicate addresses between multiple lines. (For example, see Patent Document 2). Furthermore, there is a known method of holding the subsequent memory access instruction when an overlap between the address range of a vector scatter instruction with area specification and the address of a subsequent memory access instruction is known (for example, Patent Document 3 reference).

特表２０１９－５１７０６０号公報Special table 2019-517060 publication 特開２０２０－５２８６２号公報JP2020-52862A 特開２００２－２４２０５号公報Japanese Patent Application Publication No. 2002-24205

ところで、命令をアウトオブオーダで実行する演算処理装置は、命令をインオーダでコミットさせる機構を有する。メモリアクセス命令をインオーダでコミットさせる場合、メモリアクセス命令に含まれるアドレスを保持するロードストアキューが設けられる場合がある。そして、ロードストアキューに保持されたアドレスと、後続または先行のメモリアクセス命令に含まれるアドレスとの衝突が判定され、判定結果に基づいてロードストアキューに保持されたメモリアクセス命令をコミットするか否かが決定される。 By the way, an arithmetic processing device that executes instructions out-of-order has a mechanism for committing instructions in-order. When committing memory access instructions in-order, a load/store queue may be provided to hold addresses included in the memory access instructions. Then, a collision between the address held in the load store queue and an address included in a subsequent or preceding memory access instruction is determined, and based on the determination result, whether or not to commit the memory access instruction held in the load store queue is determined. is determined.

例えば、メモリアクセス命令が実行されると、そのアドレスがロードストアキューに格納され、また既に格納されている別のメモリアクセス命令のアドレスと比較される。ギャザー命令やスキャッター命令等の複数のアドレスを含むベクトルメモリアクセス命令の場合には、それぞれ複数のアドレスが、ロードストアキューに格納され、また比較される。 For example, when a memory access instruction is executed, its address is stored in a load store queue and compared with the address of another memory access instruction that has already been stored. In the case of a vector memory access instruction that includes multiple addresses, such as a gather instruction or a scatter instruction, each of the multiple addresses is stored in a load store queue and compared.

このため、ベクトルメモリアクセス命令を実行可能な演算処理装置では、複数のアドレスを並列に比較する複数の比較器が設けられることがある。複数の比較器を設けた場合、演算処理装置の回路規模が増大する。 For this reason, an arithmetic processing device capable of executing vector memory access instructions may be provided with a plurality of comparators that compare a plurality of addresses in parallel. When a plurality of comparators are provided, the circuit scale of the arithmetic processing device increases.

１つの側面では、本発明は、メモリアクセス命令に含まれるアドレスとメモリアクセス命令を保持するキューに保持されたアドレスとの衝突を判定する衝突判定部の回路規模を低減することを目的とする。 In one aspect, the present invention aims to reduce the circuit scale of a collision determination unit that determines a collision between an address included in a memory access instruction and an address held in a queue that holds memory access instructions.

一つの観点によれば、演算処理装置は、少なくとも１つのアドレスを含むメモリアクセス命令を保持するキューと、メモリアクセス命令に複数のアドレスが含まれる場合、前記複数のアドレスのビットを縮約して縮約アドレスを生成する縮約アドレス生成部と、前記縮約アドレスと前記キューに保持されたアドレスとの衝突を判定する衝突判定部と、前記衝突判定部による判定結果に基づいて、前記キューに保持されたメモリアクセス命令の処理を制御するアクセス制御部と、を有する。 According to one aspect, an arithmetic processing unit includes a queue holding a memory access instruction including at least one address, and when the memory access instruction includes a plurality of addresses, compressing bits of the plurality of addresses. a condensed address generation unit that generates a condensed address; a collision determination unit that determines a collision between the condensed address and an address held in the queue; and an access control unit that controls processing of the held memory access command.

メモリアクセス命令に含まれるアドレスとメモリアクセス命令を保持するキューに保持されたアドレスとの衝突を判定する衝突判定部の回路規模を低減することができる。 It is possible to reduce the circuit scale of a collision determination unit that determines a collision between an address included in a memory access command and an address held in a queue that holds memory access commands.

一実施形態における演算処理装置の要部の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of a main part of an arithmetic processing device in an embodiment. 図１のペイロードの状態の変化の一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of a change in the state of the payload in FIG. 1; 図１の縮約アドレス生成部による縮約アドレスの生成方法の例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of a method for generating a contracted address by the contracted address generation unit of FIG. 1; 図３の縮約アドレスで示されるアドレス範囲の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of an address range indicated by the contracted address in FIG. 3; 図１の一致判定部の各一致判定回路によるアドレスの判定動作の一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of an address determination operation by each coincidence determination circuit of the coincidence determination section in FIG. 1; 他の演算処理装置の一例を示すブロック図である。FIG. 3 is a block diagram illustrating an example of another arithmetic processing device. 別の実施形態における演算処理装置の一例を示すブロック図である。It is a block diagram showing an example of an arithmetic processing device in another embodiment. 図７のペイロードの一例と縮約アドレス生成部による縮約アドレスの生成方法の一例とを示す説明図である。FIG. 8 is an explanatory diagram illustrating an example of the payload of FIG. 7 and an example of a method for generating an abbreviated address by an abbreviated address generation unit. 図７の一致判定回路の一例を示す回路図である。8 is a circuit diagram showing an example of the match determination circuit of FIG. 7. FIG. 別の実施形態における演算処理装置の要部の一例を示すブロック図である。FIG. 3 is a block diagram showing an example of a main part of an arithmetic processing device in another embodiment. 別の実施形態における演算処理装置の要部の一例を示すブロック図である。FIG. 3 is a block diagram showing an example of a main part of an arithmetic processing device in another embodiment. 別の実施形態における演算処理装置の要部の一例を示すブロック図である。FIG. 3 is a block diagram showing an example of a main part of an arithmetic processing device in another embodiment. 図１２の一致判定回路の動作の一例を示す説明図である。FIG. 13 is an explanatory diagram showing an example of the operation of the match determination circuit of FIG. 12;

以下、図面を参照して、実施形態が説明される。 Embodiments will be described below with reference to the drawings.

図１は、一実施形態における演算処理装置の一例を示す。図１に示す演算処理装置１は、例えば、ＳＩＭＤ演算命令を実行可能なＣＰＵ（Central Processing Unit）等のプロセッサである。 FIG. 1 shows an example of an arithmetic processing device in one embodiment. The arithmetic processing device 1 shown in FIG. 1 is, for example, a processor such as a CPU (Central Processing Unit) capable of executing SIMD arithmetic instructions.

演算処理装置１は、ロードストアキュー２、アクセス制御部８およびデータキャッシュ９を有する。ロードストアキュー２は、縮約アドレス生成部３、ペイロード４および一致判定部５を有する。なお、図１は、メモリアクセスに使用する要素の一部を示している。実際には、演算処理装置１は、図示しない命令キャッシュ、命令デコーダ、リザベーションステーション等のスケジューラ、レジスタファイル、ＳＩＭＤ演算命令を実行可能な演算器を含む演算ユニット等を有してもよい。 The arithmetic processing device 1 includes a load/store queue 2, an access control section 8, and a data cache 9. The load store queue 2 includes a contracted address generator 3, a payload 4, and a match determiner 5. Note that FIG. 1 shows some of the elements used for memory access. In reality, the arithmetic processing device 1 may include an instruction cache (not shown), an instruction decoder, a scheduler such as a reservation station, a register file, an arithmetic unit including an arithmetic unit capable of executing SIMD arithmetic instructions, and the like.

リザベーションステーション等のスケジューラを有する演算処理装置１は、命令デコーダがデコードした命令順（すなわち、プログラムに記述された命令順）と異なる順で命令を実行する場合がある。このため、ロード命令およびストア命令のインオーダ順でのコミットを保証するために、アドレスの衝突を検出するロードストアキュー２が設けられる。アドレスの衝突については、図２で説明される。ロード命令およびストア命令は、単一のアドレスまたは複数のアドレスを含む。 An arithmetic processing device 1 having a scheduler such as a reservation station may execute instructions in an order different from the order of instructions decoded by an instruction decoder (that is, the order of instructions written in a program). Therefore, in order to ensure that load instructions and store instructions are committed in in-order order, a load store queue 2 is provided that detects address collisions. Address collisions are explained in FIG. 2. Load and store instructions include a single address or multiple addresses.

縮約アドレス生成部３は、ロード命令またはストア命令等のメモリアクセス命令ＭＡに複数のアドレスＡＤ（ＡＤ０－ＡＤ７）が含まれる場合、複数のアドレスＡＤを縮約して縮約アドレスＣＡＤを生成する。例えば、縮約アドレス生成部３は、スケジューラから発行されるベクトルロード命令またはベクトルストア命令に基づいて、ベクトルロード命令またはベクトルストア命令に含まれる複数のアドレスを縮約する。 When a memory access instruction MA such as a load instruction or a store instruction includes a plurality of addresses AD (AD0-AD7), the abbreviated address generation unit 3 abridges the plurality of addresses AD to generate an abbreviated address CAD. . For example, the reduced address generation unit 3 reduces a plurality of addresses included in a vector load instruction or a vector store instruction based on a vector load instruction or a vector store instruction issued from a scheduler.

例えば、ベクトルロード命令およびベクトルストア命令として、アドレスが昇順または降順で連続する連続アドレスベクトルロード命令および連続アドレスベクトルストア命令や、アドレスが等間隔であるストライドベクトルロード命令およびストライドベクトルストア命令がある。また、ベクトルロード命令として、任意の複数のアドレスが指定されるギャザー命令がある。ベクトルストア命令として、任意の複数のアドレスが指定されるスキャッター命令がある。 For example, vector load instructions and vector store instructions include continuous address vector load instructions and continuous address vector store instructions in which addresses are consecutive in ascending or descending order, and stride vector load instructions and stride vector store instructions in which addresses are equally spaced. Further, as a vector load instruction, there is a gather instruction in which a plurality of arbitrary addresses are specified. As a vector store instruction, there is a scatter instruction in which multiple arbitrary addresses are specified.

図１では、８個のアドレスＡＤ０－ＡＤ７が縮約される例が示されるが、縮約アドレス生成部３は、２個以上のアドレスＡＤを縮約可能であり、単一のアドレスＡＤを縮約アドレスＣＡＤとして出力することも可能である。例えば、縮約アドレス生成部３は、メモリアクセス命令ＭＡに対応して単一のアドレスＡＤを受信した場合、単一のアドレスＡＤを縮約アドレスＣＡＤとして出力してもよい。この場合、アドレスＡＤをペイロード４に転送する経路は省略されてもよい。 Although FIG. 1 shows an example in which eight addresses AD0 to AD7 are contracted, the contracted address generation unit 3 can contract two or more addresses AD, and can contract a single address AD. It is also possible to output as approximately address CAD. For example, when the contracted address generation unit 3 receives a single address AD in response to the memory access command MA, it may output the single address AD as the contracted address CAD. In this case, the route for transferring the address AD to the payload 4 may be omitted.

ペイロード４は、メモリアクセス命令ＭＡを保持する複数のエントリＥＮＴを含む。ペイロード４は、キューの一例である。例えば、エントリＥＮＴには、実行フラグ、ロード命令またはストア命令を示す命令コード、アドレスおよびデータがメモリアクセス命令ＭＡとして保持される。エントリＥＮＴに保持されるデータは、メモリアクセス命令ＭＡに含まれるストアデータまたはデータキャッシュ９から読み出されたロードデータである。図１では、簡単化のため、各エントリＥＮＴのアドレス領域のみが示される。ペイロード４の例は、図２に示される。 Payload 4 includes multiple entries ENT holding memory access instructions MA. Payload 4 is an example of a queue. For example, the entry ENT holds an execution flag, an instruction code indicating a load instruction or a store instruction, an address, and data as a memory access instruction MA. The data held in the entry ENT is store data included in the memory access instruction MA or load data read from the data cache 9. In FIG. 1, only the address area of each entry ENT is shown for simplicity. An example of payload 4 is shown in FIG.

ペイロード４は、各エントリＥＮＴに保持しているアドレス（ＡＤまたはＣＡＤ）を一致判定部５に出力する。また、ペイロード４は、アクセス制御部８から指示されるエントリＥＮＴに保持しているメモリアクセス命令ＭＡをデータキャッシュ９に出力する。ペイロード４は、図示しないスケジューラおよびレジスタファイルから転送されるメモリアクセス命令ＭＡおよび縮約アドレス生成部３から出力される縮約アドレスＣＡＤを保持する。 The payload 4 outputs the address (AD or CAD) held in each entry ENT to the match determination unit 5. The payload 4 also outputs the memory access instruction MA held in the entry ENT instructed by the access control unit 8 to the data cache 9. The payload 4 holds a memory access instruction MA transferred from a scheduler and a register file (not shown) and an abbreviated address CAD output from the abbreviated address generator 3.

縮約アドレス生成部３が生成した縮約アドレスＣＡＤをペイロード４に格納することで、縮約前の複数のアドレスＡＤをペイロード４に格納する場合に比べて、ペイロード４に格納可能なメモリアクセス命令ＭＡの数を増やすことができる。これにより、アクセス制御部８により処理を制御可能なメモリアクセス命令ＭＡの数を増やすことができ、演算処理装置１の処理性能を向上することができる。また、演算処理装置１の処理性能を変えない場合、ペイロード４のエントリＥＮＴの数を削減することが可能になり、演算処理装置１の回路規模を削減することができる。 By storing the abbreviated address CAD generated by the abbreviated address generation unit 3 in the payload 4, the memory access command that can be stored in the payload 4 is more The number of MAs can be increased. Thereby, the number of memory access instructions MA whose processing can be controlled by the access control unit 8 can be increased, and the processing performance of the arithmetic processing device 1 can be improved. Further, when the processing performance of the arithmetic processing device 1 is not changed, the number of entries ENT of the payload 4 can be reduced, and the circuit scale of the arithmetic processing device 1 can be reduced.

一致判定部５は、ペイロード４のエントリＥＮＴにそれぞれ対応する複数の一致判定回路６を有する。各一致判定回路６は、ペイロード４からのアドレス（ＡＤまたはＣＡＤ）を縮約アドレス生成部３が生成した縮約アドレスＣＡＤと比較し、アドレスの衝突を検出した場合、衝突信号ＣＯＬを出力する。一致判定部５は、縮約アドレスＣＡＤとペイロード４に保持されたアドレスとの衝突を判定する衝突判定部の一例である。 The match determining unit 5 includes a plurality of match determining circuits 6 each corresponding to the entry ENT of the payload 4. Each match determination circuit 6 compares the address (AD or CAD) from the payload 4 with the contracted address CAD generated by the contracted address generation section 3, and outputs a collision signal COL if an address collision is detected. The match determination unit 5 is an example of a collision determination unit that determines a collision between the contracted address CAD and the address held in the payload 4.

アクセス制御部８は、ペイロード４に保持されたメモリアクセス命令ＭＡの処理を制御する。例えば、アクセス制御部８は、ペイロード４に保持されたメモリアクセス命令ＭＡに基づいてデータキャッシュ９のアクセスを制御する。また、アクセス制御部８は、一致判定部５から出力される衝突信号ＣＯＬに基づいて、ペイロード４に保持されたメモリアクセス命令ＭＡのコミット処理を制御する。 The access control unit 8 controls the processing of the memory access command MA held in the payload 4. For example, the access control unit 8 controls access to the data cache 9 based on the memory access instruction MA held in the payload 4. Furthermore, the access control unit 8 controls the commit processing of the memory access instruction MA held in the payload 4 based on the collision signal COL output from the match determination unit 5.

データキャッシュ９は、ロード命令に対応する読み出し要求の受信に基づいて、読み出し対象のデータＤＴをデータキャッシュ９内のデータアレイから読み出してレジスタファイルに出力する。データキャッシュ９は、ストア命令に対応する書き込み要求の受信に基づいて、データアレイに保持されたデータを書き込み対象のデータで更新する。データキャッシュ９は、アクセス対象のデータをデータアレイに保持していない場合（キャッシュミス）、下位のキャッシュまたはメインメモリ等のメモリからデータを読み出す。 The data cache 9 reads the data DT to be read from the data array in the data cache 9 and outputs it to the register file based on the reception of the read request corresponding to the load instruction. The data cache 9 updates the data held in the data array with the data to be written based on the reception of the write request corresponding to the store command. If the data to be accessed is not held in the data array (cache miss), the data cache 9 reads the data from a lower cache or a memory such as the main memory.

なお、図１では、ロードストアキュー２は、ロード命令とストア命令とに共通に動作する例が示される。しかしながら、例えば、ロードストアキュー２は、ロード命令とストア命令とのそれぞれに対応して動作してもよい。この場合、ペイロード４は、ロード命令が格納される複数のエントリＥＮＴと、ストア命令が格納される複数のエントリＥＮＴとを有する。 Note that FIG. 1 shows an example in which the load/store queue 2 operates in common for load instructions and store instructions. However, for example, the load/store queue 2 may operate in response to each of a load instruction and a store instruction. In this case, the payload 4 includes multiple entries ENT in which load instructions are stored and multiple entries ENT in which store instructions are stored.

そして、縮約アドレス生成部３は、ロード命令に含まれる複数のアドレスＡＤを受信した場合、生成した縮約アドレスＣＡＤをストア命令用のエントリＥＮＴにそれぞれ保持された複数のアドレスと比較して衝突を判定する。縮約アドレス生成部３は、ストア命令に含まれる複数のアドレスＡＤを受信した場合、生成した縮約アドレスＣＡＤをロード命令用のエントリＥＮＴにそれぞれ保持された複数のアドレスと比較して衝突を判定する。 When the abbreviated address generation unit 3 receives a plurality of addresses AD included in the load instruction, the abbreviated address generation unit 3 compares the generated abbreviated address CAD with the plurality of addresses respectively held in the entry ENT for the store instruction to avoid collisions. Determine. When the abbreviated address generation unit 3 receives a plurality of addresses AD included in a store instruction, the abbreviated address generation unit 3 compares the generated abbreviated address CAD with the plurality of addresses respectively held in the entry ENT for the load instruction to determine a conflict. do.

図２は、図１のペイロード４の状態の変化の一例を示す。図２では、ペイロード４のエントリＥＮＴ１、ＥＮＴ２、ＥＮＴ３、ＥＮＴ４に２個のストア命令ＳＴ（ＳＴ１、ＳＴ２）と２個のロード命令ＬＤ（ＬＤ３、ＬＤ４）とがそれぞれ格納されている例が示される。例えば、ペイロード４は、リングバッファとして機能し、エントリＥＮＴの番号は、プログラムでの命令の記述順を示す。 FIG. 2 shows an example of a change in the state of the payload 4 in FIG. 1. In FIG. 2, an example is shown in which two store instructions ST (ST1, ST2) and two load instructions LD (LD3, LD4) are stored in entries ENT1, ENT2, ENT3, and ENT4 of payload 4, respectively. . For example, payload 4 functions as a ring buffer, and the entry number ENT indicates the order in which instructions are written in the program.

実行フラグの"０"は、命令が実行されていないことを示す。実行フラグの"１"は、命令が実行されたことを示す。アドレスおよびデータの欄に示す数値の前に付した"０ｘ"は、数値が１６進数であることを示す。アドレスおよびデータの欄に示す符号ｎ／ａは、アドレスまたはデータが確定していないことを示す。網掛けのエントリＥＮＴは、状態が変化したことを示す。例えば、ペイロード４の制御は、図１のアクセス制御部８により実行される。 An execution flag of "0" indicates that the instruction is not being executed. An execution flag of "1" indicates that the instruction has been executed. "0x" added in front of the numerical value shown in the address and data columns indicates that the numerical value is a hexadecimal number. The symbol n/a shown in the address and data column indicates that the address or data has not been determined. A shaded entry ENT indicates that the state has changed. For example, control of the payload 4 is executed by the access control unit 8 in FIG.

なお、図２では、簡単化のため、単一のアドレスを含むスカラストア命令ＳＴおよびスカラロード命令ＬＤがペイロード４に格納される例が示される。しかしながら、複数のアドレスを含むベクトルストア命令ＳＴおよびベクトルロード命令ＬＤがペイロード４に格納されてもよい。ベクトルストア命令ＳＴおよびベクトルロード命令ＬＤがペイロード４に格納される場合、エントリＥＮＴには、縮約アドレスＣＡＤが格納される。また、ベクトルストア命令ＳＴがペイロード４に格納される場合、複数のアドレスにそれぞれ対応する複数のデータが格納される。 Note that in FIG. 2, for the sake of simplicity, an example is shown in which a scalar store instruction ST and a scalar load instruction LD including a single address are stored in the payload 4. However, a vector store instruction ST and a vector load instruction LD including multiple addresses may be stored in the payload 4. When the vector store instruction ST and vector load instruction LD are stored in the payload 4, the contracted address CAD is stored in the entry ENT. Furthermore, when the vector store instruction ST is stored in the payload 4, a plurality of pieces of data corresponding to a plurality of addresses are stored.

状態１では、ストア命令ＳＴ１およびロード命令ＬＤ４が実行され、ストア命令ＳＴ２およびロード命令ＬＤ３は実行されていない。ストア命令ＳＴ２のアドレスが確定していないため、ストア命令ＳＴ２の後続のロード命令ＬＤ４により取得したデータ"０ｘ４５６"は、正しくない可能性がある。 In state 1, store instruction ST1 and load instruction LD4 are executed, and store instruction ST2 and load instruction LD3 are not executed. Since the address of the store instruction ST2 has not been determined, the data "0x456" acquired by the load instruction LD4 subsequent to the store instruction ST2 may be incorrect.

次に、状態２において、ロード命令ＬＤ３が実行される。ロード命令ＬＤ３に含まれるアドレスは、縮約アドレスＣＡＤとして縮約アドレス生成部６２から出力される。一致判定部５は、命令の種別にかかわりなく、ロード命令ＬＤ３のアドレスとペイロード４に格納された全ての命令のアドレスとを比較する。 Next, in state 2, load instruction LD3 is executed. The address included in the load instruction LD3 is output from the abbreviated address generation unit 62 as an abbreviated address CAD. The match determining unit 5 compares the address of the load instruction LD3 with the addresses of all instructions stored in the payload 4, regardless of the type of instruction.

アクセス制御部８は、一致判定部５による比較結果のうち、ロード命令ＬＤ３のアドレスと、ペイロード４に格納された、ロード命令ＬＤ３に先行するストア命令ＳＴ１、ＳＴ２のアドレスの比較結果を参照する。そして、アクセス制御部８は、ロード命令ＬＤ３のアドレスとストア命令ＳＴ１のアドレスとの衝突を検出する。このため、アクセス制御部８は、ロード命令ＬＤ３の読み出し対象のデータを、データキャッシュ９から読み出すのではなく、エントリＥＮＴ１からフォワーディングすることを決定し、エントリＥＮＴ１に保持されたデータ"０ｘ１２３"をエントリＥＮＴ３に格納する。 The access control unit 8 refers to the comparison results of the address of the load instruction LD3 and the addresses of the store instructions ST1 and ST2 that precede the load instruction LD3 and are stored in the payload 4, among the comparison results by the match determination unit 5. Then, the access control unit 8 detects a collision between the address of the load instruction LD3 and the address of the store instruction ST1. Therefore, the access control unit 8 decides to forward the data to be read by the load instruction LD3 from the entry ENT1 instead of reading it from the data cache 9, and transfers the data "0x123" held in the entry ENT1 to the entry. Store in ENT3.

次に、状態３において、縮約アドレス生成部３によりストア命令ＳＴ２のアドレスがエントリＥＮＴ２に格納された後、ストア命令ＳＴ２が実行される。そして、ストア命令ＳＴ２のアドレス"０ｘ１００"とデータ"０ｘ７８９"とがエントリＥＮＴ２に格納される。ストア命令ＳＴ２に含まれるアドレスは、縮約アドレスＣＡＤとして縮約アドレス生成部６２から出力される。 Next, in state 3, after the contracted address generation unit 3 stores the address of the store instruction ST2 in the entry ENT2, the store instruction ST2 is executed. Then, the address "0x100" and data "0x789" of the store instruction ST2 are stored in the entry ENT2. The address included in the store instruction ST2 is output from the abbreviated address generation unit 62 as an abbreviated address CAD.

一致判定部５は、命令の種別にかかわりなく、ストア命令ＳＴ２のアドレスとペイロード４に格納された全ての命令のアドレスとを比較する。アクセス制御部８は、一致判定部５による比較結果のうち、ストア命令ＳＴ２のアドレスと、ペイロード４に格納された、ストア命令ＳＴ２より後続のロード命令ＬＤ３、ＬＤ４のアドレスの比較結果を参照する。そして、アクセス制御部８は、ストア命令ＳＴ２のアドレスとロード命令ＬＤ３のアドレスとの衝突を検出する。 The match determining unit 5 compares the address of the store instruction ST2 with the addresses of all instructions stored in the payload 4, regardless of the type of instruction. The access control unit 8 refers to the comparison results of the address of the store instruction ST2 and the addresses of the load instructions LD3 and LD4 subsequent to the store instruction ST2, which are stored in the payload 4, among the comparison results by the match determination unit 5. Then, the access control unit 8 detects a collision between the address of the store instruction ST2 and the address of the load instruction LD3.

状態４において、アクセス制御部８は、ストア命令ＳＴ２の後続のロード命令ＬＤ３、ＬＤ４の実行をキャンセルし、エントリＥＮＴ３、ＥＮＴ４から追い出す。これにより、状態２でエントリＥＮＴ１から誤ってフォワーディングされたロード命令ＬＤ３のデータ"０ｘ１２３"をキャンセルすることができる。キャンセルされたロード命令ＬＤ３、ＬＤ４は、その後、再発行される。 In state 4, the access control unit 8 cancels the execution of the load instructions LD3 and LD4 subsequent to the store instruction ST2, and evicts them from the entries ENT3 and ENT4. This makes it possible to cancel the data "0x123" of the load instruction LD3 that was erroneously forwarded from the entry ENT1 in state 2. The canceled load instructions LD3 and LD4 are then reissued.

この実施形態では、一致判定部５は、縮約アドレス生成部３が生成した縮約アドレスＣＡＤをペイロード４に保持されたアドレスと比較するため、縮約前の複数のアドレスを比較に使用する場合に比べて、一致判定回路６の数を削減することができる。これにより、縮約前の複数のアドレスを比較に使用する場合に比べて、一致判定部５の回路規模を低減することができる。 In this embodiment, the match determining unit 5 compares the abbreviated address CAD generated by the abbreviated address generating unit 3 with the address held in the payload 4, so when a plurality of addresses before abbreviation are used for comparison. The number of match determination circuits 6 can be reduced compared to the above. This makes it possible to reduce the circuit scale of the match determination unit 5 compared to the case where a plurality of addresses before contraction are used for comparison.

また、縮約アドレスＣＡＤがペイロード４に格納されるため、縮約前の複数のアドレスをペイロード４に格納する場合に比べて、エントリＥＮＴの使用効率を向上することができ、ペイロード４に格納可能なメモリアクセス命令ＭＡの数を増やすことができる。これにより、アクセス制御部８により処理を制御可能なメモリアクセス命令ＭＡの数を増やすことができ、演算処理装置１の処理性能を向上することができる。 In addition, since the contracted address CAD is stored in payload 4, it is possible to improve the usage efficiency of entry ENT compared to the case where multiple addresses before contracting are stored in payload 4, and it can be stored in payload 4. The number of memory access instructions MA can be increased. Thereby, the number of memory access instructions MA whose processing can be controlled by the access control unit 8 can be increased, and the processing performance of the arithmetic processing device 1 can be improved.

また、エントリＥＮＴの使用効率を向上しなくてよい場合、エントリＥＮＴの数を削減することができる。これにより、一致判定回路６の数をさらに削減することができ、一致判定部５の回路規模をさらに低減することができる。 Furthermore, if it is not necessary to improve the usage efficiency of entries ENT, the number of entries ENT can be reduced. Thereby, the number of match determination circuits 6 can be further reduced, and the circuit scale of the match determination section 5 can be further reduced.

図３は、図１の縮約アドレス生成部３による縮約アドレスの生成方法の例を示す。図３では、メモリアクセス命令ＭＡ１が、８個のアドレスＡＤ０－ＡＤ７を含むギャザー命令またはスキャッター命令である場合の例が示される。なお、簡単化のため、図３では、各アドレスＡＤ０－ＡＤ７が８ビットである例が示されるが、各アドレスＡＤ０－ＡＤ７のビット数は、８ビットに限定されない。図３以降で説明されるアドレスＡＤ０－ＡＤ７も８ビットに限定されない。 FIG. 3 shows an example of a method for generating a contracted address by the contracted address generation unit 3 of FIG. 1. FIG. 3 shows an example where the memory access instruction MA1 is a gather instruction or a scatter instruction including eight addresses AD0 to AD7. For simplicity, FIG. 3 shows an example in which each address AD0-AD7 is 8 bits, but the number of bits of each address AD0-AD7 is not limited to 8 bits. Addresses AD0 to AD7, which will be explained from FIG. 3 onwards, are also not limited to 8 bits.

生成方法１では、縮約アドレス生成部３は、アドレスＡＤ０－ＡＤ７の各ビット位置において、ビット値が全て"０"の場合、縮約アドレスＣＡＤのビット値を"０"に設定し、ビット値が全て"１"の場合、縮約アドレスＣＡＤのビット値を"１"に設定する。また、縮約アドレス生成部３は、アドレスＡＤ０－ＡＤ７の各ビット位置において、"０"または"１"のビット値が混在する場合、縮約アドレスＣＡＤのビット値を不定値"Ｘ"に設定する。生成方法２では、縮約アドレス生成部３は、生成方法１の規則に加えて、縮約アドレスＣＡＤで不定値"Ｘ"を示すビット位置より下位のビットを不定値"Ｘ"に設定する。 In generation method 1, when the bit values are all "0" at each bit position of addresses AD0 to AD7, the contracted address generation unit 3 sets the bit value of the contracted address CAD to "0" and generates the bit value. are all "1", the bit value of the contracted address CAD is set to "1". In addition, when bit values of "0" or "1" coexist in each bit position of addresses AD0 to AD7, the contracted address generation unit 3 sets the bit value of the contracted address CAD to an undefined value "X". do. In the generation method 2, in addition to the rules of the generation method 1, the contracted address generation unit 3 sets the bits lower than the bit position indicating the indefinite value "X" in the contracted address CAD to the indefinite value "X".

このように、縮約アドレス生成部３は、生成方法１または生成方法２を使用して、"０"、"１"、"Ｘ"の３値論理で表現される縮約アドレスＣＡＤを生成することができる。なお、メモリアクセス命令ＭＡ１が、単一のアドレスＡＤを含む場合、縮約アドレス生成部３は、単一のアドレスＡＤを縮約アドレスＣＡＤとする。 In this way, the contracted address generation unit 3 uses generation method 1 or generation method 2 to generate the contracted address CAD expressed in three-valued logic of "0", "1", and "X". be able to. Note that when the memory access instruction MA1 includes a single address AD, the contracted address generation unit 3 sets the single address AD as the contracted address CAD.

これにより、メモリアクセス命令ＭＡが単一のアドレスＡＤを含むか複数のアドレスＡＤを含むかによらず、縮約アドレス生成部３が生成した縮約アドレスＣＡＤをペイロード４に格納することができる。したがって、単一のアドレスＡＤであるか縮約アドレスＣＡＤであるかに応じてペイロード４の格納方法を相違させる場合に比べて、ペイロード４にアドレスＡＤ、ＣＡＤを格納する制御を容易にすることができる。 Thereby, the abbreviated address CAD generated by the abbreviated address generation unit 3 can be stored in the payload 4 regardless of whether the memory access instruction MA includes a single address AD or multiple addresses AD. Therefore, compared to the case where the method of storing the payload 4 is different depending on whether it is a single address AD or a contracted address CAD, it is possible to easily control the storage of the addresses AD and CAD in the payload 4. can.

但し、演算処理装置１は、２進数を扱うため、不定値"Ｘ"を使用することができない。このため、実際には、図８で説明されるように、縮約アドレス生成部３は、３値論理で表現される縮約アドレスＣＡＤを２進数で表現可能な形式に変換する。なお、縮約アドレス生成部３は、例えば、縮約アドレスＣＡＤにおいて、不定値"Ｘ"を値"００"に設定し、値"０"を値"０１"に設定し、値"１"を値"１０"に設定してもよい。 However, since the arithmetic processing device 1 handles binary numbers, it cannot use the indefinite value "X". Therefore, in practice, as explained in FIG. 8, the contracted address generation unit 3 converts the contracted address CAD expressed in ternary logic into a format that can be expressed in binary numbers. Note that, for example, in the contracted address CAD, the contracted address generation unit 3 sets the undefined value "X" to the value "00", sets the value "0" to the value "01", and sets the value "1" to the value "00". It may be set to the value "10".

図４は、図３の縮約アドレスＣＡＤで示されるアドレス範囲の一例を示す。図４に例示するメモリアクセス命令ＭＡ１に含まれる８個のアドレスＡＤ０－ＡＤ７は、図３に示したアドレスＡＤ０－ＡＤ７と同じである。また、図４に例示する縮約アドレスＭＡ１．ＣＡＤは、図３の生成方法２に示した縮約アドレスＭＡ１．ＣＡＤと同じである。縮約アドレスＣＡＤが図３の生成方法２により生成される場合、一致判定部５は、図４に示す"１０１０００００"から"１０１１１１１１"の範囲のアドレスＡＤを縮約アドレスＭＡ１．ＣＡＤと衝突すると判定する。 FIG. 4 shows an example of the address range indicated by the contracted address CAD in FIG. The eight addresses AD0-AD7 included in the memory access instruction MA1 illustrated in FIG. 4 are the same as the addresses AD0-AD7 illustrated in FIG. 3. Further, the contracted address MA1.exemplified in FIG. CAD uses the contracted address MA1. shown in generation method 2 in FIG. It is the same as CAD. When the contracted address CAD is generated by the generation method 2 shown in FIG. 3, the match determining unit 5 converts the address AD in the range from "10100000" to "10111111" shown in FIG. 4 into the contracted address MA1. It is determined that there is a conflict with CAD.

縮約アドレスＣＡＤが図３の生成方法１により生成される場合、例えば、アドレスＭＡ１．ＡＤ０の最下位から２ビット目が"０"である"１０１１０００１"のアドレスＡＤは、縮約アドレスＭＡ１．ＣＡＤと衝突していないと判定される。このため、生成方法１では、生成方法２に比べて、縮約アドレスＣＡＤに含まれるアドレスＡＤの数を減らすことができ、衝突の判定精度を向上することができる。 When the contracted address CAD is generated by the generation method 1 of FIG. 3, for example, the address MA1. The address AD of "10110001" where the second bit from the least significant bit of AD0 is "0" is the contracted address MA1. It is determined that there is no conflict with CAD. Therefore, in the generation method 1, the number of addresses AD included in the contracted address CAD can be reduced compared to the generation method 2, and the accuracy of collision determination can be improved.

図５は、図１の一致判定部５の各一致判定回路６によるアドレスの判定動作の一例を示す。一致判定回路６は、図１に示したように、縮約アドレス生成部３が生成した縮約アドレスＣＡＤの各ビットと、ペイロード４のエントリＥＮＴのいずれかに保持されたアドレスの各ビットとを比較する。ここで、ペイロード４のエントリＥＮＴのいずれかに保持されたアドレスは、単一のアドレスＡＤまたは縮約アドレスＣＡＤである。 FIG. 5 shows an example of an address determination operation by each coincidence determination circuit 6 of the coincidence determination section 5 of FIG. 1. In FIG. As shown in FIG. 1, the match determination circuit 6 compares each bit of the abbreviated address CAD generated by the abbreviated address generator 3 with each bit of the address held in one of the entries ENT of the payload 4. compare. Here, the address held in either entry ENT of payload 4 is a single address AD or a contracted address CAD.

例えば、一致判定回路６は、比較するビット値が"０"と"１"または"１"と"０"の場合、不一致を示す"０"をアンド回路ＡＮＤに出力する。一致判定回路６は、比較するビット値が"０"同士、"１"同士の場合、または、比較するビット値の少なくとも一方が不定値"Ｘ"の場合、一致を示す"１"をアンド回路ＡＮＤに出力する。 For example, when the bit values to be compared are "0" and "1" or "1" and "0", the match determination circuit 6 outputs "0" indicating a mismatch to the AND circuit AND. When the bit values to be compared are "0" and "1", or when at least one of the bit values to be compared is an undefined value "X", the match determination circuit 6 outputs "1" indicating a match using an AND circuit. Output to AND.

アンド回路ＡＮＤは、比較結果のビット値が全て"１"（全て一致）の場合、衝突信号ＣＯＬをアドレスの衝突を示す"１"に設定する。アンド回路ＡＮＤは、比較結果のビット値のいずれかが"０"（不一致）の場合、衝突信号ＣＯＬをアドレスが衝突していないことを示す"０"に設定する。図１のアクセス制御部８は、各一致判定回路６から出力される衝突信号ＣＯＬの論理値に基づいて、縮約アドレス生成部３が生成した縮約アドレスＣＡＤとペイロード４に保持されたアドレスとの衝突を判定する。そして、アクセス制御部８は、判定結果に基づいて、メモリアクセス命令ＭＡをコミットするか否かを決定し、キューに保持されたメモリアクセス命令ＭＡの処理を制御する。 If the bit values of the comparison result are all "1" (all match), the AND circuit AND sets the collision signal COL to "1" indicating address collision. If any of the bit values of the comparison result is "0" (mismatch), the AND circuit AND sets the collision signal COL to "0" indicating that there is no address collision. The access control unit 8 in FIG. 1 compares the abbreviated address CAD generated by the abbreviated address generation unit 3 with the address held in the payload 4 based on the logical value of the collision signal COL output from each coincidence determination circuit 6. Determine the collision. Then, the access control unit 8 determines whether or not to commit the memory access instruction MA based on the determination result, and controls the processing of the memory access instruction MA held in the queue.

図６は、他の演算処理装置の一例を示す。図６に示す演算処理装置１Ａは、図１の縮約アドレス生成部３を持たず、図１の一致判定部５およびアクセス制御部８の代わりに一致判定部５Ａおよびアクセス制御部８Ａを有する。 FIG. 6 shows an example of another arithmetic processing device. The arithmetic processing device 1A shown in FIG. 6 does not have the contracted address generation section 3 of FIG. 1, but has a match determination section 5A and an access control section 8A instead of the match determination section 5 and access control section 8 of FIG.

演算処理装置１Ａが縮約アドレス生成部３を持たない場合、一致判定部５Ａは、ベクトルロード命令ＬＤまたはベクトルストア命令ＳＴに含まれる複数のアドレスＡＤ０－ＡＤ７を直接受信する。そして、一致判定部５Ａは、受信したアドレスＡＤ０－ＡＤ７をペイロード４の各エントリＥＮＴに保持されたアドレスＡＤと比較する。このため、一致判定部５Ａは、アドレスＡＤ０－ＡＤ７の数とエントリＥＮＴの数との積に対応する数の一致判定回路６を有する。 When the arithmetic processing device 1A does not have the contracted address generation section 3, the coincidence determination section 5A directly receives the plurality of addresses AD0 to AD7 included in the vector load instruction LD or vector store instruction ST. Then, the match determining unit 5A compares the received addresses AD0 to AD7 with the addresses AD held in each entry ENT of the payload 4. For this reason, the match determining section 5A has a number of match determining circuits 6 corresponding to the product of the number of addresses AD0 to AD7 and the number of entries ENT.

アクセス制御部８Ａは、全ての一致判定回路６から出力される衝突信号ＣＯＬを受信し、受信した衝突信号ＣＯＬに基づいて、ペイロード４に保持されたメモリアクセス命令ＭＡのコミット処理を制御する。図６に示すように、縮約アドレス生成部３を持たない演算処理装置１Ａの一致判定部５Ａおよびアクセス制御部８Ａの回路規模は、図１の一致判定部５およびアクセス制御部８の回路規模より大きい。 The access control unit 8A receives the collision signals COL output from all the coincidence determination circuits 6, and controls the commit processing of the memory access instruction MA held in the payload 4 based on the received collision signal COL. As shown in FIG. 6, the circuit scale of the match determination unit 5A and the access control unit 8A of the arithmetic processing device 1A that does not have the contracted address generation unit 3 is the same as that of the match determination unit 5 and the access control unit 8 in FIG. bigger.

なお、例えば、図１の縮約アドレス生成部３の回路規模は、アドレスＡＤ毎に一致判定回路６の２個分程度である。このため、図１の一致判定部５およびアクセス制御部８の回路規模の減少分は、縮約アドレス生成部３の回路規模の増加分に比べて十分に大きい。 Note that, for example, the circuit scale of the contracted address generation unit 3 in FIG. 1 is about two match determination circuits 6 for each address AD. Therefore, the reduction in the circuit scale of the match determination section 5 and the access control section 8 in FIG. 1 is sufficiently larger than the increase in the circuit scale of the contracted address generation section 3.

以上、この実施形態では、一致判定部５は、縮約アドレス生成部３が生成した縮約アドレスＣＡＤをペイロード４に保持されたアドレスと比較する。このため、縮約前の複数のアドレスＡＤ０－ＡＤ７を比較に使用する場合に比べて、一致判定回路６の数を削減することができ、一致判定部５の回路規模を低減することができる。 As described above, in this embodiment, the match determining unit 5 compares the abbreviated address CAD generated by the abbreviated address generating unit 3 with the address held in the payload 4. Therefore, compared to the case where a plurality of addresses AD0 to AD7 before contraction are used for comparison, the number of match determination circuits 6 can be reduced, and the circuit scale of the match determination section 5 can be reduced.

縮約アドレスＣＡＤがペイロード４に格納されるため、縮約前の複数のアドレスをペイロード４に格納する場合に比べて、エントリＥＮＴの使用効率を向上することができ、ペイロード４に格納可能なメモリアクセス命令ＭＡの数を増やすことができる。これによりアクセス制御部８により処理を制御可能なメモリアクセス命令ＭＡの数を増やすことができ、演算処理装置１の処理性能を向上することができる。 Since the contracted address CAD is stored in the payload 4, it is possible to improve the usage efficiency of the entry ENT compared to the case where multiple addresses before contracting are stored in the payload 4, and the memory that can be stored in the payload 4 is The number of access instructions MA can be increased. Thereby, the number of memory access instructions MA whose processing can be controlled by the access control unit 8 can be increased, and the processing performance of the arithmetic processing device 1 can be improved.

縮約アドレス生成部３により単一のアドレスＡＤも縮約アドレスＣＡＤとすることで、メモリアクセス命令ＭＡに含まれるアドレスＡＤが単一か複数かによらず、縮約アドレス生成部３が生成した縮約アドレスＣＡＤをペイロード４に格納することができる。したがって、ペイロード４にアドレスＡＤ、ＣＡＤを格納する制御を容易にすることができる。 By converting a single address AD into a contracted address CAD by the contracted address generation unit 3, regardless of whether the address AD included in the memory access instruction MA is single or multiple, the contracted address generation unit 3 generates the contracted address AD. The condensed address CAD can be stored in the payload 4. Therefore, it is possible to easily control storing the addresses AD and CAD in the payload 4.

図７は、別の実施形態における演算処理装置の一例を示す。図１から図６と同様の要素については、詳細な説明は省略する。図７に示す演算処理装置１００は、図１の演算処理装置１と同様に、例えば、ＳＩＭＤ演算命令を実行可能なＣＰＵ等のプロセッサである。 FIG. 7 shows an example of an arithmetic processing device in another embodiment. Detailed description of elements similar to those in FIGS. 1 to 6 will be omitted. The arithmetic processing device 100 shown in FIG. 7 is, like the arithmetic processing device 1 in FIG. 1, a processor such as a CPU that can execute SIMD arithmetic instructions, for example.

演算処理装置１００は、命令キャッシュ１０、デコーダ２０、リザベーションステーション等のスケジューラ３０、レジスタファイル４０、複数のロードストア（ＬＤＳＴ）ユニット５０および複数の演算ユニット９０を有する。 The arithmetic processing device 100 includes an instruction cache 10, a decoder 20, a scheduler 30 such as a reservation station, a register file 40, a plurality of load/store (LDST) units 50, and a plurality of arithmetic units 90.

命令キャッシュ１０は、メインメモリ等のメモリから転送される命令を保持し、保持している命令をデコーダ２０に出力する。例えば、命令キャッシュ１０は、１次命令キャッシュでもよい。命令キャッシュ１０に保持される命令は、演算命令およびメモリアクセス命令である。 The instruction cache 10 holds instructions transferred from a memory such as the main memory, and outputs the held instructions to the decoder 20. For example, instruction cache 10 may be a primary instruction cache. The instructions held in the instruction cache 10 are arithmetic instructions and memory access instructions.

例えば、演算命令は、整数演算命令、固定小数点演算命令および浮動小数点演算命令等を含む。例えば、メモリアクセス命令は、ロード命令およびストア命令等を含む。また、整数演算命令、固定小数点演算命令および浮動小数点演算命令の少なくともいずれかは、ＳＩＭＤ演算命令を含んでもよい。さらに、図１の演算処理装置１と同様に、ロード命令は、単一のアドレスを含むスカラロード命令に加えて、ベクトルロード命令ＬＤとして連続アドレスロード命令ＬＤおよびギャザー命令を含む。ストア命令は、単一のアドレスを含むスカラストア命令に加えて、ベクトルストア命令ＳＴとして連続アドレスストア命令ＳＴおよびスキャッター命令を含む。 For example, the arithmetic instructions include integer arithmetic instructions, fixed point arithmetic instructions, floating point arithmetic instructions, and the like. For example, memory access instructions include load instructions, store instructions, and the like. Furthermore, at least one of the integer arithmetic instructions, fixed point arithmetic instructions, and floating point arithmetic instructions may include a SIMD arithmetic instruction. Furthermore, similar to the arithmetic processing device 1 of FIG. 1, the load instructions include a continuous address load instruction LD as a vector load instruction LD and a gather instruction in addition to a scalar load instruction including a single address. Store instructions include a scalar store instruction including a single address, a continuous address store instruction ST as a vector store instruction ST, and a scatter instruction.

デコーダ２０は、命令キャッシュ１０からインオーダで受信する命令をデコードし、デコードした命令をスケジューラ３０に出力する。なお、演算処理装置１００は、命令キャッシュ１０とデコーダ２０との間に命令キャッシュ１０から転送される複数の命令を蓄積する命令バッファを有してもよい。 The decoder 20 decodes instructions received in-order from the instruction cache 10 and outputs the decoded instructions to the scheduler 30. Note that the arithmetic processing device 100 may have an instruction buffer between the instruction cache 10 and the decoder 20 that stores a plurality of instructions transferred from the instruction cache 10.

デコーダ２０によりデコードされた命令に含まれる論理レジスタ番号は、例えば、リネームユニットによりレジスタファイル４０内の物理レジスタを識別する物理レジスタ番号に変換されてもよい。論理レジスタ番号は、プログラムに記述されるレジスタ番号である。リネームユニットの搭載により、演算処理装置１００は、プログラムで記述可能なレジスタの数より多い物理レジスタをレジスタファイル４０に搭載することができる。この結果、リネームユニットを設けない場合に比べて、レジスタが競合する頻度を低減することができ、命令の実行効率を向上することができる。 The logical register number included in the instruction decoded by the decoder 20 may be converted, for example, by a rename unit into a physical register number that identifies a physical register within the register file 40. The logical register number is a register number written in the program. By installing the rename unit, the arithmetic processing device 100 can install more physical registers in the register file 40 than the number of registers that can be written in a program. As a result, compared to the case where no rename unit is provided, the frequency of register conflicts can be reduced, and the efficiency of instruction execution can be improved.

スケジューラ３０は、デコーダ２０から出力される演算命令を保持する複数のエントリを含む演算キューと、デコーダ２０から出力されるメモリアクセス命令を保持する複数のエントリを含むメモリアクセスキューとを有する。スケジューラ３０は、演算キューに保持した演算命令を実行可能な順にアウトオブオーダで演算ユニット９０のいずれかに発行する。また、スケジューラ３０は、メモリアクセスキューに保持した命令を実行可能な順にアウトオブオーダでロードストアユニット５０のいずれかに出力する。 Scheduler 30 has an operation queue that includes a plurality of entries that hold arithmetic instructions output from decoder 20, and a memory access queue that includes a plurality of entries that hold memory access instructions that are output from decoder 20. The scheduler 30 issues the arithmetic instructions held in the arithmetic queue to any of the arithmetic units 90 in an executable order out of order. Furthermore, the scheduler 30 outputs the instructions held in the memory access queue to any of the load/store units 50 in an executable order out of order.

複数のロードストアユニット５０の各々は、ロード命令およびストア命令を実行する。複数のロードストアユニット５０の各々は、複数のアドレス計算器５２を有する。また、複数のロードストアユニット５０は、複数のロードストアユニット５０に共通のロードストアキュー６０、アクセス制御部７０およびＬ１（Level 1）データキャッシュ８０を有する。ロードストアキュー６０は、複数のロードストアユニット５０の各々に対応する縮約アドレス生成部６２と、複数のロードストアユニット５０に共通のペイロード６４および一致判定部６６を有する。一致判定部６６は、複数の一致判定回路６７を有する。 Each of the plurality of load/store units 50 executes a load instruction and a store instruction. Each of the plurality of load store units 50 has a plurality of address calculators 52. Further, the plurality of load/store units 50 have a load/store queue 60, an access control section 70, and an L1 (Level 1) data cache 80 that are common to the plurality of load/store units 50. The load/store queue 60 has a contracted address generation section 62 corresponding to each of the plurality of load/store units 50, and a payload 64 and a match determination section 66 common to the plurality of load/store units 50. The match determination section 66 includes a plurality of match determination circuits 67.

複数のアドレス計算器５２の各々は、レジスタファイル４０から転送されるデータの加算処理等を実行することにより、メモリアクセス命令によるアクセス対象のアドレスを計算する。複数のアドレス計算器５２の各々は、計算により得たアドレスを、対応する縮約アドレス生成部６２およびペイロード６４に出力する。また、ロード命令であればＬ１データキャッシュ８０にアドレスＡＤを出力する。各ロードストアユニット５０に複数のアドレス計算器５２を設けることで、ベクトルロード命令またはベクトルストア命令に含まれる複数のアドレスを並列に計算することができる。 Each of the plurality of address calculators 52 calculates the address to be accessed by the memory access command by performing addition processing of data transferred from the register file 40 and the like. Each of the plurality of address calculators 52 outputs the address obtained by calculation to the corresponding contracted address generation section 62 and payload 64. Further, if it is a load instruction, the address AD is output to the L1 data cache 80. By providing a plurality of address calculators 52 in each load/store unit 50, a plurality of addresses included in a vector load instruction or a vector store instruction can be calculated in parallel.

縮約アドレス生成部６２は、図１の縮約アドレス生成部３と同様に、ロード命令またはストア命令に含まれる複数のアドレスＡＤを縮約して縮約アドレスＣＡＤを生成する。縮約アドレスＣＡＤの生成方法の例は、図８に示される。ペイロード６４は、図１のペイロード４と同様に、メモリアクセス命令を保持する図示しない複数のエントリＥＮＴを含む。ペイロード６４は、キューの一例である。ペイロード６４の例は、図８に示される。 The contracted address generation section 62, similar to the contracted address generation section 3 in FIG. 1, contracts a plurality of addresses AD included in a load instruction or a store instruction to generate a contracted address CAD. An example of a method for generating the reduced address CAD is shown in FIG. The payload 64, like the payload 4 in FIG. 1, includes a plurality of entries ENT (not shown) that hold memory access instructions. Payload 64 is an example of a queue. An example of payload 64 is shown in FIG.

一致判定部６６は、図１の一致判定部５と同様に、ペイロード６４のエントリに複数の一致判定回路６７を有する。各一致判定回路６７は、ペイロード６４からのアドレスを縮約アドレス生成部６２が生成した縮約アドレスＣＡＤと比較し、比較結果に応じた衝突信号ＣＯＬを出力する。一致判定部６６は、縮約アドレスＣＡＤとペイロード４に保持されたアドレスとの衝突を判定する衝突判定部の一例である。一致判定回路６７は、衝突判定回路の一例である。 The match determination unit 66 has a plurality of match determination circuits 67 in the entry of the payload 64, similar to the match determination unit 5 in FIG. Each match determination circuit 67 compares the address from the payload 64 with the contracted address CAD generated by the contracted address generation section 62, and outputs a collision signal COL according to the comparison result. The match determination unit 66 is an example of a collision determination unit that determines a collision between the contracted address CAD and the address held in the payload 4. The coincidence determination circuit 67 is an example of a collision determination circuit.

アクセス制御部７０は、図１のアクセス制御部８と同様に、衝突信号ＣＯＬに基づいてペイロード６４に保持されたメモリアクセス命令の処理を制御し、Ｌ１データキャッシュ８０のアクセスを制御する。Ｌ１データキャッシュ８０は、図１のデータキャッシュ９と同様の構成および機能を有する。 The access control unit 70 controls the processing of the memory access command held in the payload 64 based on the collision signal COL, and controls access to the L1 data cache 80, similar to the access control unit 8 in FIG. L1 data cache 80 has the same configuration and functions as data cache 9 in FIG. 1.

各演算ユニット９０は、演算命令を実行する。例えば、各演算ユニット９０は、固定小数点演算器、浮動小数点演算器および論理演算器を有する。 Each arithmetic unit 90 executes arithmetic instructions. For example, each arithmetic unit 90 has a fixed-point arithmetic unit, a floating-point arithmetic unit, and a logic arithmetic unit.

図８は、図７のペイロード６４の一例と縮約アドレス生成部６２による縮約アドレスの生成方法の一例とを示す。ペイロード６４は、メモリアクセス命令ＭＡを保持する複数のエントリＥＮＴ（ＥＮＴ１－ＥＮＴ６等）を含む。例えば、各エントリＥＮＴには、実行フラグ、命令種別（ロード命令ＬＤまたはストア命令ＳＴ）、キーアドレスＫＥＹ、データ、マスクベクトルＭＳＫおよび元のアドレスが保持される。元のアドレスは、各アドレス計算器５２が計算した、縮約前のアドレスである。この実施形態では、縮約アドレスＣＡＤは、キーアドレスＫＥＹおよびマスクベクトルＭＳＫとして表現される。 FIG. 8 shows an example of the payload 64 of FIG. 7 and an example of a method for generating a contracted address by the contracted address generation unit 62. The payload 64 includes a plurality of entries ENT (ENT1 to ENT6, etc.) holding memory access instructions MA. For example, each entry ENT holds an execution flag, instruction type (load instruction LD or store instruction ST), key address KEY, data, mask vector MSK, and original address. The original address is the address calculated by each address calculator 52 before reduction. In this embodiment, the contracted address CAD is expressed as a key address KEY and a mask vector MSK.

縮約アドレス生成部６２は、図１の縮約アドレス生成部３と同様に、"０"、"１"、不定値"Ｘ"で表現される３値論理を使用して縮約アドレスＣＡＤを生成する。但し、縮約アドレス生成部６２は、演算処理装置１００で扱う２進数によりアドレスの衝突を判定するために、縮約アドレスＣＡＤをキーアドレスＫＥＹおよびマスクベクトルＭＳＫとして表現する。 Similar to the contracted address generator 3 in FIG. 1, the contracted address generator 62 generates the contracted address CAD using three-value logic expressed by "0", "1", and an undefined value "X". generate. However, the contracted address generation unit 62 expresses the contracted address CAD as a key address KEY and a mask vector MSK in order to determine address collision using binary numbers handled by the arithmetic processing unit 100.

３値論理で表される縮約アドレスＣＡＤを２進数で表現可能な形式に変換することで、２進数を扱う演算処理装置１００において、一致判定部６６は、不定値"Ｘ"を含む縮約アドレスＣＡＤの衝突を判定することができる。換言すれば、演算処理装置１００のアーキテクチャを変更することなく、不定値"Ｘ"を含む縮約アドレスＣＡＤの衝突を判定することができる。 By converting the contracted address CAD expressed in three-value logic into a format that can be expressed in binary numbers, the match determination unit 66 in the arithmetic processing device 100 that handles binary numbers converts the contracted address CAD that includes the indefinite value "X" into a format that can be expressed in binary numbers. Address CAD collisions can be determined. In other words, without changing the architecture of the arithmetic processing device 100, it is possible to determine the collision of contracted addresses CAD including the undefined value "X".

縮約アドレス生成部６２は、メモリアクセス命令ＭＡに含まれる複数のアドレスＡＤのいずれか１つをキーアドレスＫＥＹとして選択する。また、縮約アドレス生成部６２は、メモリアクセス命令ＭＡに含まれる複数のアドレスＡＤの各ビット位置でのビット値の排他的論理和ＸＯＲを算出し、マスクベクトルＭＳＫとする。図８では、メモリアクセス命令ＭＡ１に含まれる８個のアドレスＡＤ０－ＡＤ７からマスクベクトルＭＳＫを算出する例が示される。縮約アドレス生成部６２が生成した縮約アドレスＣＡＤ（ＫＥＹ、ＭＳＫ）は、一致判定部６６による判定に使用され、メモリアクセス命令ＭＡの情報とともにペイロード６４に格納される。 The contracted address generation unit 62 selects any one of the plurality of addresses AD included in the memory access instruction MA as the key address KEY. Further, the contracted address generation unit 62 calculates the exclusive OR (XOR) of the bit values at each bit position of the plurality of addresses AD included in the memory access instruction MA, and sets it as a mask vector MSK. FIG. 8 shows an example of calculating the mask vector MSK from eight addresses AD0 to AD7 included in the memory access instruction MA1. The contracted address CAD (KEY, MSK) generated by the contracted address generation section 62 is used for determination by the match determination section 66, and is stored in the payload 64 together with information on the memory access instruction MA.

図９は、図７の一致判定回路６７の一例を示す。一致判定回路６７は、否定排他的論理和回路ＸＮＯＲ、オア回路ＯＲ１、ＯＲ２およびアンド回路ＡＮＤを有する。オア回路ＯＲ１は、第１論理和回路の一例であり、オア回路ＯＲ２は、第２論理和回路の一例である。アンド回路ＡＮＤは、論理積回路の一例である。 FIG. 9 shows an example of the match determination circuit 67 of FIG. The match determination circuit 67 includes a negative exclusive OR circuit XNOR, OR circuits OR1 and OR2, and an AND circuit AND. The OR circuit OR1 is an example of a first OR circuit, and the OR circuit OR2 is an example of a second OR circuit. The AND circuit AND is an example of an AND circuit.

否定排他的論理和回路ＸＮＯＲは、ペイロード６４のエントリＥＮＴのいずれかに保持されたキーアドレスＫＥＹと、縮約アドレス生成部６２が生成するキーアドレスＫＥＹとのビット同士の否定排他的論理和を算出する。オア回路ＯＲ１は、ペイロード６４のエントリＥＮＴのいずれかに保持されたマスクベクトルＭＳＫと、縮約アドレス生成部６２が生成するマスクベクトルＭＳＫとのビット同士の論理和を算出する。 The negative exclusive OR circuit XNOR calculates the negative exclusive OR of the bits of the key address KEY held in any of the entries ENT of the payload 64 and the key address KEY generated by the contracted address generation unit 62. do. The OR circuit OR1 calculates the logical OR of the bits of the mask vector MSK held in any of the entries ENT of the payload 64 and the mask vector MSK generated by the contracted address generation section 62.

オア回路ＯＲ２は、否定排他的論理和回路ＥＮＯＲの出力とオア回路ＯＲ１の出力とのビット同士の論理和を算出する。アンド回路ＡＮＤは、オア回路ＯＲ２の出力の全ビットの論理積を算出し、算出結果を衝突信号ＣＯＬとして出力する。 The OR circuit OR2 calculates the bitwise OR of the output of the negative exclusive OR circuit ENOR and the output of the OR circuit OR1. The AND circuit AND calculates the AND of all bits of the output of the OR circuit OR2, and outputs the calculation result as a collision signal COL.

一致判定回路６７は、図９の括弧内に例示するキーアドレスＫＥＹおよびマスクベクトルＭＳＫをペイロード６４および縮約アドレス生成部６２からそれぞれ受信した場合、アドレスの衝突を示す衝突信号ＣＯＬ（＝"１"）を出力する。このように、一致判定回路６７は、縮約アドレスＣＡＤがキーアドレスＫＥＹおよびマスクベクトルＭＳＫで表現される場合にも、アドレスの衝突を判定することができる。すなわち、一致判定回路６７は、３値論理で表現される縮約アドレスの衝突を判定することができる。 When the match determination circuit 67 receives the key address KEY and the mask vector MSK illustrated in parentheses in FIG. 9 from the payload 64 and the contracted address generator 62, respectively, the match determination circuit 67 generates a collision signal COL (="1") indicating an address collision. ) is output. In this way, the match determination circuit 67 can determine address collision even when the contracted address CAD is expressed by the key address KEY and the mask vector MSK. That is, the match determination circuit 67 can determine a collision of contracted addresses expressed in ternary logic.

以上、この実施形態においても、上述した実施形態と同様の効果を得ることができる。例えば、一致判定部６６が縮約アドレスＣＡＤをペイロード６４に保持されたアドレスと比較するため、縮約前の複数のアドレスＡＤを比較に使用する場合に比べて、一致判定回路６７の数を削減することができ、一致判定部６６の回路規模を低減することができる。また、縮約アドレスＣＡＤをペイロード６４に格納することで、縮約前の複数のアドレスＡＤをペイロード６４に格納する場合に比べて多くのメモリアクセス命令ＭＡをペイロード６４に格納することができ、演算処理装置１００の処理性能を向上することができる。 As described above, in this embodiment as well, the same effects as in the above-described embodiment can be obtained. For example, since the match determination unit 66 compares the contracted address CAD with the address held in the payload 64, the number of match determination circuits 67 is reduced compared to the case where multiple addresses AD before contraction are used for comparison. Therefore, the circuit scale of the match determination section 66 can be reduced. Furthermore, by storing the contracted address CAD in the payload 64, more memory access instructions MA can be stored in the payload 64 than in the case of storing a plurality of addresses AD before contracting in the payload 64. The processing performance of the processing device 100 can be improved.

さらに、この実施形態では、３値論理で表現される縮約アドレスＣＡＤを２進数で表現可能な形式に変換することで、２進数を扱う演算処理装置１００において、一致判定部６６は、不定値"Ｘ"を含む縮約アドレスＣＡＤの衝突を判定することができる。換言すれば、演算処理装置１００のアーキテクチャを変更することなく、不定値"Ｘ"を含む縮約アドレスＣＡＤの衝突を判定することができる。一致判定回路６７は、縮約アドレスＣＡＤがキーアドレスＫＥＹおよびマスクベクトルＭＳＫで表現される場合にも、アドレスの衝突を判定することができる。 Furthermore, in this embodiment, by converting the contracted address CAD expressed in ternary logic into a format that can be expressed in binary numbers, the match determination unit 66 in the arithmetic processing device 100 that handles binary numbers A collision of contracted addresses CAD containing "X" can be determined. In other words, without changing the architecture of the arithmetic processing device 100, it is possible to determine the collision of contracted addresses CAD including the undefined value "X". The match determination circuit 67 can also determine address collision when the contracted address CAD is expressed by a key address KEY and a mask vector MSK.

図１０は、別の実施形態における演算処理装置の要部の一例を示す。上述した実施形態と同様の要素については、詳細な説明は省略する。図１０に示す演算処理装置１００Ａは、ロードストアキュー６０Ａ、アクセス制御部７０ＡおよびＬ１データキャッシュ８０を有する。ロードストアキュー６０Ａは、縮約アドレス生成部６２Ａ、ペイロード６４および一致判定部６６Ａを有する。一致判定部６６Ａは、衝突判定部の一例である。 FIG. 10 shows an example of a main part of an arithmetic processing device in another embodiment. Detailed description of elements similar to those in the embodiment described above will be omitted. The arithmetic processing device 100A shown in FIG. 10 includes a load/store queue 60A, an access control section 70A, and an L1 data cache 80. The load store queue 60A includes a contracted address generation section 62A, a payload 64, and a match determination section 66A. The coincidence determination section 66A is an example of a collision determination section.

縮約アドレス生成部６２Ａは、メモリアクセス命令ＭＡ（ロード命令またはストア命令）に含まれる複数のアドレスＡＤ（ＡＤ０－ＡＤ７）をグループ分けし、グループ分けした複数のアドレスグループ毎に縮約アドレスＣＡＤ０、ＣＡＤ１を生成する。縮約アドレスＣＡＤ０、ＣＡＤ１は、一致判定部６６Ａに出力され、ペイロード６４に格納される。 The abridged address generation unit 62A groups a plurality of addresses AD (AD0-AD7) included in a memory access instruction MA (load instruction or store instruction), and generates abbreviated addresses CAD0, CAD0, and CAD0 for each of the divided address groups. Generate CAD1. The contracted addresses CAD0 and CAD1 are output to the match determination unit 66A and stored in the payload 64.

アドレスグループ毎に縮約アドレスＣＡＤ０、ＣＡＤ１を生成することで、アドレスＡＤをグループ分けせずに１つの縮約アドレスＣＡＤを生成する場合に比べて、縮約アドレスＣＡＤ０、ＣＡＤ１のそれぞれで示されるアドレスＡＤの範囲を狭くすることができる。これにより、各縮約アドレスＣＡＤ０、ＣＡＤ１に含まれるアドレスＡＤの数を減らすことができ、衝突の判定精度を向上することができる。 By generating abbreviated addresses CAD0 and CAD1 for each address group, compared to the case where one abbreviated address CAD is generated without dividing addresses AD into groups, the address indicated by each abbreviated address CAD0 and CAD1 is The range of AD can be narrowed. Thereby, the number of addresses AD included in each contracted address CAD0, CAD1 can be reduced, and the accuracy of collision determination can be improved.

一致判定部６６Ａは、ペイロード６４のエントリＥＮＴにそれぞれ対応する数の複数の一致判定回路６７を、縮約アドレスＣＡＤ０、ＣＡＤ１毎に有する。各一致判定回路６７は、図９の一致判定回路６７と同じ構成および機能を有する。図１０の一致判定回路６７は、図１の一致判定回路６と同様に、ペイロード６４の対応するエントリＥＮＴが保持するアドレスを、対応する縮約アドレスＣＡＤ０またはＡＤ１と比較する。そして、一致判定回路６７は、比較結果に応じた衝突信号ＣＯＬをアクセス制御部７０Ａに出力する。 The match determining unit 66A has a plurality of match determining circuits 67, the number of which corresponds to the entries ENT of the payload 64, for each contracted address CAD0 and CAD1. Each match determination circuit 67 has the same configuration and function as the match determination circuit 67 in FIG. Similar to the match determining circuit 6 in FIG. 1, the match determining circuit 67 in FIG. 10 compares the address held by the corresponding entry ENT of the payload 64 with the corresponding contracted address CAD0 or AD1. Then, the match determination circuit 67 outputs a collision signal COL according to the comparison result to the access control unit 70A.

アクセス制御部７０Ａは、縮約アドレスＣＡＤ０、ＣＡＤ１毎の複数の衝突信号ＣＯＬに基づいてペイロード６４に保持されたメモリアクセス命令の処理を制御し、Ｌ１データキャッシュ８０のアクセスを制御する。 The access control unit 70A controls the processing of the memory access command held in the payload 64 based on the plurality of collision signals COL for each contracted address CAD0 and CAD1, and controls access to the L1 data cache 80.

以上、この実施形態においても、上述した実施形態と同様の効果を得ることができる。例えば、一致判定部６６Ａは、縮約アドレスＣＡＤをペイロード６４に保持されたアドレスと比較する。このため、縮約前の複数のアドレスＡＤを比較に使用する場合に比べて、一致判定回路６７の数を削減することができ、一致判定部６６Ａの回路規模を低減することができる。また、縮約アドレス生成部６２Ａは、生成した複数の縮約アドレスＣＡＤ０、ＣＡＤ１をペイロード６４に格納する。このため、縮約前の複数のアドレスＡＤをペイロード６４に格納する場合に比べて、ペイロード６４に格納可能なメモリアクセス命令の数を増やすことができる。これにより、演算処理装置１００Ａの処理性能を向上することができる。 As described above, in this embodiment as well, the same effects as in the above-described embodiment can be obtained. For example, the match determination unit 66A compares the contracted address CAD with the address held in the payload 64. Therefore, compared to the case where a plurality of addresses AD before contraction are used for comparison, the number of match determination circuits 67 can be reduced, and the circuit scale of the match determination section 66A can be reduced. Further, the contracted address generation unit 62A stores the plurality of generated contracted addresses CAD0 and CAD1 in the payload 64. Therefore, the number of memory access instructions that can be stored in the payload 64 can be increased compared to the case where a plurality of addresses AD before contraction are stored in the payload 64. Thereby, the processing performance of the arithmetic processing device 100A can be improved.

さらに、この実施形態では、縮約アドレス生成部６２Ａは、複数の縮約アドレスＣＡＤ０、ＣＡＤ１を生成する。これにより、１つの縮約アドレスＣＡＤを生成する場合に比べて、縮約アドレスＣＡＤ０、ＣＡＤ１のそれぞれで示されるアドレスＡＤの範囲を狭くすることができる。したがって、各縮約アドレスＣＡＤ０、ＣＡＤ１に含まれるアドレスＡＤの数を減らすことができ、衝突の判定精度を向上することができる。この結果、例えば、実際にはアドレスＡＤが衝突していないロード命令ＬＤが、先行するストア命令ＳＴと衝突すると判定されてキャンセルされる頻度を低減することができ、演算処理装置１００Ａの処理性能の低下を抑制することができる。 Furthermore, in this embodiment, the contracted address generation unit 62A generates a plurality of contracted addresses CAD0 and CAD1. As a result, the range of addresses AD indicated by each of the contracted addresses CAD0 and CAD1 can be narrowed compared to the case where one contracted address CAD is generated. Therefore, the number of addresses AD included in each contracted address CAD0, CAD1 can be reduced, and the accuracy of collision determination can be improved. As a result, for example, it is possible to reduce the frequency that a load instruction LD whose address AD does not actually conflict is canceled because it is determined to conflict with a preceding store instruction ST, and the processing performance of the arithmetic processing unit 100A can be reduced. The decrease can be suppressed.

図１１は、別の実施形態における演算処理装置の要部の一例を示す。上述した実施形態と同様の要素については、詳細な説明は省略する。図１１に示す演算処理装置１００Ｂは、図１０のロードストアキュー６０Ａの代わりにロードストアキュー６０Ｂを有することを除き、図１０の演算処理装置１００Ａの構成と同様である。 FIG. 11 shows an example of a main part of an arithmetic processing device in another embodiment. Detailed description of elements similar to those in the embodiment described above will be omitted. The arithmetic processing device 100B shown in FIG. 11 has the same configuration as the arithmetic processing device 100A in FIG. 10, except that it has a load/store queue 60B instead of the load/store queue 60A in FIG.

ロードストアキュー６０Ｂは、図１０のロードストアキュー６０Ａに縮約アドレス生成部６２を追加している。縮約アドレス生成部６２は、図８の縮約アドレス生成部６２と同じ構成および機能を有している。すなわち、縮約アドレス生成部６２は、メモリアクセス命令ＭＡ（ロード命令またはストア命令）に含まれる複数のアドレスＡＤ（ＡＤ０－ＡＤ７）を縮約して縮約アドレスＣＡＤを生成する。図１１の縮約アドレス生成部６２は、第１縮約アドレス生成部の一例であり、縮約アドレス生成部６２が生成する縮約アドレスＣＡＤは、第１縮約アドレスの一例である。縮約アドレス生成部６２Ａは、第２縮約アドレス生成部の一例であり、縮約アドレス生成部６２Ａが生成する縮約アドレスＣＡＤ０、ＣＡＤ１は、第２縮約アドレスの一例である。 The load/store queue 60B has a contracted address generator 62 added to the load/store queue 60A in FIG. The condensed address generation section 62 has the same configuration and function as the condensed address generation section 62 in FIG. 8 . That is, the contracted address generation unit 62 contracts a plurality of addresses AD (AD0-AD7) included in a memory access instruction MA (load instruction or store instruction) to generate a contracted address CAD. The abbreviated address generation unit 62 in FIG. 11 is an example of a first abbreviated address generation unit, and the abbreviated address CAD generated by the abbreviated address generation unit 62 is an example of the first abbreviated address. The condensed address generation unit 62A is an example of a second condensed address generation unit, and the condensed addresses CAD0 and CAD1 generated by the condensed address generation unit 62A are examples of second condensed addresses.

縮約アドレス生成部６２が生成した縮約アドレスＣＡＤは、ペイロード６４に格納される。したがって、この実施形態では、ペイロード６４に格納される縮約アドレスＣＡＤの数を、図１０のペイロード６４に格納される縮約アドレスＣＡＤ０、ＣＡＤ１の数より減らすことができる。この結果、ペイロード６４に格納されるアドレスＡＤの数が相対的に増えるため、ペイロード６４に格納されるメモリアクセス命令の数を増やすことができ、演算処理装置１００Ｂの処理性能を向上することができる。 The contracted address CAD generated by the contracted address generation unit 62 is stored in the payload 64. Therefore, in this embodiment, the number of contracted addresses CAD stored in the payload 64 can be reduced from the number of contracted addresses CAD0 and CAD1 stored in the payload 64 of FIG. As a result, the number of addresses AD stored in the payload 64 increases relatively, so the number of memory access instructions stored in the payload 64 can be increased, and the processing performance of the arithmetic processing unit 100B can be improved. .

以上、この実施形態においても、上述した実施形態と同様の効果を得ることができる。さらに、この実施形態では、ペイロード６４に縮約アドレスＣＡＤが格納され、一致判定部６６Ａに縮約アドレスＣＡＤ０、ＣＡＤ１が出力される。これにより、一致判定部６６Ａによる衝突の判定精度を向上しつつ、縮約アドレスＣＡＤ０、ＣＡＤ１がペイロード６４に格納される場合に比べて、ペイロード６４に格納されるアドレスＡＤの数を増やすことができる。この結果、演算処理装置１００Ｂの処理性能を向上することができる。 As described above, in this embodiment as well, the same effects as in the above-described embodiment can be obtained. Furthermore, in this embodiment, the contracted address CAD is stored in the payload 64, and the contracted addresses CAD0 and CAD1 are output to the match determination section 66A. This makes it possible to increase the number of addresses AD stored in the payload 64 compared to the case where the contracted addresses CAD0 and CAD1 are stored in the payload 64, while improving the accuracy of collision determination by the match determination unit 66A. . As a result, the processing performance of the arithmetic processing device 100B can be improved.

なお、上記の例では、縮約アドレス生成部６２が生成した縮約アドレスＣＡＤは、１つの縮約アドレスになる場合で説明した。しかしながら、縮約アドレス生成部６２Ａが生成した縮約アドレスとは異なる数、または、異なるグループの分け方で分けられた複数グループ毎に複数の縮約アドレスを生成してもよい。また、縮約アドレス生成部６２Ａが生成した縮約アドレスも２つの例で説明しているが３つ以上あってもよい。 In the above example, the abbreviated address CAD generated by the abbreviated address generation unit 62 is a single abbreviated address. However, a plurality of abbreviated addresses may be generated for each of a plurality of groups divided by a different number or a different method of grouping from the abbreviated addresses generated by the abbreviated address generation unit 62A. Furthermore, although two examples are used to explain the number of contracted addresses generated by the contracted address generation unit 62A, there may be three or more contracted addresses.

図１２は、別の実施形態における演算処理装置の要部の一例を示す。上述した実施形態と同様の要素については、詳細な説明は省略する。図１２に示す演算処理装置１００Ｃは、図７の演算処理装置１００、図１０の演算処理装置１００Ａまたは図１１の演算処理装置１００Ｂに、縮約アドレス生成部６２Ｃと一致判定回路６７Ｃとを追加している。図１２では、１つの一致判定回路６７Ｃが示されるが、実際には、一致判定回路６７Ｃは、ペイロード６４に含まれる図示しないエントリＥＮＴ毎に設けられる。そして、複数の一致判定回路６７と複数の一致判定回路６７Ｃとを含む一致判定部が設けられる。一致判定回路６７Ｃは、衝突判定回路の一例である。 FIG. 12 shows an example of a main part of an arithmetic processing device in another embodiment. Detailed description of elements similar to those in the embodiment described above will be omitted. The arithmetic processing device 100C shown in FIG. 12 is the arithmetic processing device 100 in FIG. 7, the arithmetic processing device 100A in FIG. 10, or the arithmetic processing device 100B in FIG. ing. Although one match determination circuit 67C is shown in FIG. 12, in reality, a match determination circuit 67C is provided for each entry ENT (not shown) included in the payload 64. A match determining section including a plurality of match determining circuits 67 and a plurality of match determining circuits 67C is provided. The coincidence determination circuit 67C is an example of a collision determination circuit.

縮約アドレス生成部６２Ｃは、メモリアクセス命令に含まれる複数のアドレスＡＤの範囲を示す縮約アドレスＣＡＤ２を生成する。縮約アドレス生成部６２Ｃが生成する縮約アドレスＣＡＤ２は、第４縮約アドレスの一例である。例えば、縮約アドレス生成部６２Ｃは、先頭アドレスＡＨ（＝Ａ０）と、先頭アドレスＡＨから最終アドレスＡＥまでの距離であるオフセットＯＦＳＡとを縮約アドレスＣＡＤ２として生成する。縮約アドレス生成部６２Ｃは、生成した縮約アドレスＣＡＤ２をペイロード６４に格納する。 The condensed address generation unit 62C generates a condensed address CAD2 indicating a range of a plurality of addresses AD included in the memory access command. The contracted address CAD2 generated by the contracted address generation unit 62C is an example of the fourth contracted address. For example, the abbreviated address generation unit 62C generates the start address AH (=A0) and the offset OFSA, which is the distance from the start address AH to the end address AE, as the abbreviated address CAD2. The contracted address generation unit 62C stores the generated contracted address CAD2 in the payload 64.

例えば、ペイロード６４の図示しないエントリＥＮＴには、メモリアクセス命令ＭＡを示す情報と、過去の縮約アドレス生成部６２Ｃが生成した先頭アドレスＢＨおよびオフセットＯＦＳＢを含む縮約アドレスＣＡＤ２が格納される。すなわち、ペイロード６４の各エントリＥＮＴには、縮約アドレスＣＡＤ２または図７の縮約アドレス生成部６２により生成される縮約アドレスＣＡＤが格納される。なお、メモリアクセス命令ＭＡが単一のアドレスＡＤを含む場合、縮約規則にしたがって、単一のアドレスＡＤが先頭アドレスＡＨ（またはＢＨ）として設定され、オフセットＯＦＳＡ（またはＯＦＳＢ）は"０"に設定される。 For example, in an entry ENT (not shown) of the payload 64, information indicating the memory access command MA and an abbreviated address CAD2 including a start address BH and an offset OFSB generated by the abbreviated address generation unit 62C in the past are stored. That is, each entry ENT of the payload 64 stores the abbreviated address CAD2 or the abbreviated address CAD generated by the abbreviated address generation unit 62 of FIG. Note that when the memory access instruction MA includes a single address AD, the single address AD is set as the start address AH (or BH) and the offset OFSA (or OFSB) is set to "0" according to the reduction rule. Set.

一致判定回路６７Ｃは、加算器ＡＤＤａ、ＡＤＤｂ、比較器ＣＭＰａ、ＣＭＰｂ、オア回路ＯＲおよび反転回路ＮＯＴを有する。加算器ＡＤＤａは、縮約アドレス生成部６２Ｃから出力される先頭アドレスＡＨとオフセットＯＦＳＡと加算することで、最終アドレスＡＥを算出する。加算器ＡＤＤｂは、ペイロード６４Ｃの対応するエントリＥＮＴから出力される先頭アドレスＢＨとオフセットＯＦＳＡと加算することで、最終アドレスＢＥを算出する。 The coincidence determination circuit 67C includes adders ADDa, ADDb, comparators CMPa, CMPb, an OR circuit OR, and an inversion circuit NOT. The adder ADDa calculates the final address AE by adding the start address AH output from the contracted address generation unit 62C and the offset OFSA. The adder ADDb calculates the final address BE by adding the start address BH output from the corresponding entry ENT of the payload 64C and the offset OFSA.

比較器ＣＭＰａは、最終アドレスＢＥと先頭アドレスＡＨとを比較して大小関係を判定する。例えば、比較器ＣＭＰａは、最終アドレスＢＥが先頭アドレスＡＨより小さい場合、"１"を出力し、最終アドレスＢＥが先頭アドレスＡＨ以上の場合、"０"を出力する。比較器ＣＭＰｂは、先頭アドレスＢＨと最終アドレスＡＥとを比較して大小関係を判定する。例えば、比較器ＣＭＰｂは、最終アドレスＡＥが先頭アドレスＢＨより小さい場合、"１"を出力し、最終アドレスＡＥが先頭アドレスＢＨ以上の場合、"０"を出力する。 The comparator CMPa compares the final address BE and the first address AH to determine the magnitude relationship. For example, the comparator CMPa outputs "1" when the final address BE is smaller than the starting address AH, and outputs "0" when the final address BE is greater than or equal to the starting address AH. The comparator CMPb compares the start address BH and the end address AE to determine the magnitude relationship. For example, the comparator CMPb outputs "1" when the final address AE is smaller than the starting address BH, and outputs "0" when the final address AE is greater than or equal to the starting address BH.

オア回路ＯＲは、比較器ＣＭＰａ、ＣＭＰｂの出力の論理和を反転回路ＮＯＴに出力する。反転回路ＮＯＴは、オア回路ＯＲから出力される論理値を反転し、衝突信号ＣＯＬとして出力する。したがって、衝突信号ＣＯＬの論理は、式（１）で示される。
ＣＯＬ＝"ｎｏｔ（（ＡＥ＜ＢＨ）ｏｒ（ＢＥ＜ＡＨ））‥ （１） The OR circuit OR outputs the logical sum of the outputs of the comparators CMPa and CMPb to the inversion circuit NOT. The inverting circuit NOT inverts the logical value output from the OR circuit OR and outputs it as a collision signal COL. Therefore, the logic of the collision signal COL is expressed by equation (1).
COL="not((AE<BH)or(BE<AH)))... (1)

なお、縮約アドレス生成部６２Ｃは、先頭アドレスＡＨと最終アドレスＡＥを縮約アドレスＣＡＤ２として生成してもよい。この場合、縮約アドレスＣＡＤ２のビット数が増えるが、一致判定回路６７Ｃは、加算器ＡＤＤａ、ＡＤＤｂを持たなくてもよい。メモリアクセス命令に含まれる複数のアドレスＡＤの範囲を示す縮約アドレスＣＡＤ２を生成することで、３値論理を使用して生成される縮約アドレスＣＡＤに比べて、衝突の判定精度を向上することができる。 Note that the abbreviated address generation unit 62C may generate the start address AH and the end address AE as the abbreviated address CAD2. In this case, the number of bits of the contracted address CAD2 increases, but the match determination circuit 67C does not need to include the adders ADDa and ADDb. By generating an abbreviated address CAD2 indicating the range of multiple addresses AD included in a memory access instruction, collision determination accuracy is improved compared to an abbreviated address CAD generated using ternary logic. I can do it.

図１３は、図１２の一致判定回路６７Ｃの動作の一例を示す。なお、この実施形態の一致判定部は、図９の一致判定回路６７と図１２の一致判定回路６７Ｃとを含む。図１３に示すように、衝突信号ＣＯＬは、先頭アドレスＡＨとオフセットＯＦＳＡとで示される縮約アドレスＣＡＤ２の範囲が、先頭アドレスＢＨとオフセットＯＦＳＢとで示される縮約アドレスＣＡＤ２の範囲と重ならない場合、"０"に設定される。衝突信号ＣＯＬは、先頭アドレスＡＨとオフセットＯＦＳＡとで示される縮約アドレスＣＡＤ２の範囲が、先頭アドレスＢＨとオフセットＯＦＳＢとで示される縮約アドレスＣＡＤ２の範囲と重なる場合、"１"に設定される。 FIG. 13 shows an example of the operation of the match determination circuit 67C of FIG. 12. Note that the match determination section of this embodiment includes the match determination circuit 67 of FIG. 9 and the match determination circuit 67C of FIG. 12. As shown in FIG. 13, the collision signal COL is generated when the range of the contracted address CAD2 indicated by the start address AH and the offset OFSA does not overlap the range of the contracted address CAD2 indicated by the start address BH and the offset OFSB. , is set to "0". The collision signal COL is set to "1" when the range of the contracted address CAD2 indicated by the start address AH and the offset OFSA overlaps the range of the contracted address CAD2 indicated by the start address BH and the offset OFSB. .

図１２に示す演算処理装置１００Ｃでは、メモリアクセス命令ＭＡに含まれる複数のアドレスＡＤから３値論理で表現される縮約アドレスＣＡＤと、アドレスＡＤの範囲を示す縮約アドレスＣＡＤ２とが生成される。例えば、図７の縮約アドレス生成部６２は、複数のアドレスＡＤのビットを縮約して縮約アドレスＣＡＤを生成する。なお、図１０または図１１の縮約アドレス生成部６２Ａにより縮約アドレスＣＡＤが生成されてもよい。縮約アドレス生成部６２、６２Ａが生成する縮約アドレスＣＡＤは、第３縮約アドレスの一例である。また、図１２の縮約アドレス生成部６２Ｃは、複数のアドレスＡＤの範囲を示す縮約アドレスＣＡＤ２を生成する。 In the arithmetic processing device 100C shown in FIG. 12, a contracted address CAD expressed in ternary logic and a contracted address CAD2 indicating a range of addresses AD are generated from a plurality of addresses AD included in a memory access instruction MA. . For example, the contracted address generation unit 62 in FIG. 7 contracts the bits of a plurality of addresses AD to generate a contracted address CAD. Note that the abbreviated address CAD may be generated by the abbreviated address generation unit 62A in FIG. 10 or 11. The abridged address CAD generated by the abbreviated address generators 62 and 62A is an example of the third abridged address. Further, the abbreviated address generation unit 62C in FIG. 12 generates an abbreviated address CAD2 indicating a range of a plurality of addresses AD.

ここで、昇順または降順に変化しない複数のアドレスＡＤは、このままでは縮約アドレスＣＡＤ２への変換が困難である。このため、縮約アドレス生成部６２Ｃは、縮約アドレス生成部６２と同様に、まず、"０"、"１"、"Ｘ"の３値論理で表現される縮約アドレスＣＡＤを生成する。次に、縮約アドレス生成部６２Ｃは、生成した縮約アドレスＣＡＤの不定値"Ｘ"が"０"であるとしてアドレスＡＤの最小値を生成し、生成した縮約アドレスＣＡＤの不定値"Ｘ"が"１"であるとしてアドレスＡＤの最大値を生成する。そして、縮約アドレス生成部６２Ｃは、先頭アドレスＡＨおよびオフセットＯＦＳＢを含む縮約アドレスＣＡＤ２を生成する。なお、縮約アドレス生成部６２Ｃは、縮約アドレス生成部６２が生成する縮約アドレスＣＡＤの不定値"Ｘ"を"０"と"１"とに置き換えて、アドレスＡＤの最小値と最大値とを生成してもよい。 Here, it is difficult to convert the plurality of addresses AD that do not change in ascending order or descending order into the contracted address CAD2 as is. For this reason, the abbreviated address generation unit 62C, like the abbreviated address generation unit 62, first generates an abbreviated address CAD expressed in three-valued logic of "0", "1", and "X". Next, the condensed address generation unit 62C generates the minimum value of the address AD assuming that the undefined value "X" of the generated condensed address CAD is "0", and generates the undefined value "X" of the generated condensed address CAD. Assuming that " is "1", the maximum value of address AD is generated. Then, the abridged address generation unit 62C generates an abridged address CAD2 including the start address AH and the offset OFSB. Note that the contracted address generation unit 62C replaces the undefined value “X” of the contracted address CAD generated by the contracted address generation unit 62 with “0” and “1” to determine the minimum value and maximum value of the address AD. may be generated.

縮約アドレス生成部６２および縮約アドレス生成部６２Ｃを含むロードストアキューは、複数のアドレスＡＤが昇順または降順に変化しない場合、縮約アドレス生成部６２が生成した縮約アドレスＣＡＤをペイロード６４に格納する。ロードストアキューは、複数のアドレスＡＤが昇順または降順に変化する場合、縮約アドレス生成部６２Ｃが生成した縮約アドレスＣＡＤ２をペイロード６４に格納する。 The load store queue including the abbreviated address generation unit 62 and the abbreviated address generation unit 62C outputs the abbreviated address CAD generated by the abbreviated address generation unit 62 to the payload 64 when a plurality of addresses AD do not change in ascending or descending order. Store. The load store queue stores the abbreviated address CAD2 generated by the abbreviated address generation unit 62C in the payload 64 when a plurality of addresses AD change in ascending or descending order.

一致判定回路６７、６７Ｃを含む一致判定部は、ペイロード６４のエントリに縮約アドレスＣＡＤが保持されている場合、保持されている縮約アドレスＣＡＤと縮約アドレス生成部６２が生成する縮約アドレスＣＡＤとの衝突を判定する。一致判定回路６７、６７Ｃを含む一致判定部は、ペイロード６４のエントリに縮約アドレスＣＡＤ２が保持されている場合、保持されている縮約アドレスＣＡＤ２と縮約アドレス生成部６２Ｃが生成する縮約アドレスＣＡＤ２との衝突を判定する。 When the abbreviated address CAD is held in the entry of the payload 64, the match judgment unit including the match judgment circuits 67 and 67C selects the held abbreviated address CAD and the abbreviated address generated by the abbreviated address generation unit 62. Determine collision with CAD. When the abbreviated address CAD2 is held in the entry of the payload 64, the match judgment unit including the match judgment circuits 67 and 67C selects the held abbreviated address CAD2 and the abbreviated address generated by the abbreviated address generation unit 62C. Determine collision with CAD2.

このように、メモリアクセス命令ＭＡに含まれるアドレスＡＤが昇順または降順でない場合、縮約アドレス生成部６２により生成された縮約アドレスＣＡＤを使用してアドレスＡＤの衝突が判定される。メモリアクセス命令ＭＡに含まれるアドレスＡＤが昇順または降順の場合、縮約アドレス生成部６２Ｃにより生成された縮約アドレスＣＡＤ２を使用してアドレスＡＤの衝突が判定される。 In this way, when the addresses AD included in the memory access instruction MA are not in ascending or descending order, the abbreviated address CAD generated by the abbreviated address generation unit 62 is used to determine whether the addresses AD collide. When the addresses AD included in the memory access instruction MA are in ascending or descending order, a collision of addresses AD is determined using the abbreviated address CAD2 generated by the abbreviated address generation unit 62C.

例えば、アドレスＡＤが昇順または降順でないメモリアクセス命令ＭＡとして、ギャザー命令またはスキャッター命令がある。例えば、アドレスＡＤが昇順または降順のメモリアクセス命令ＭＡとして、ストライドアクセス命令等の連続アドレスロード命令ＬＤまたは連続アドレスベクトルストア命令ＳＴがある。 For example, a gather instruction or a scatter instruction is a memory access instruction MA in which addresses AD are not in ascending or descending order. For example, as the memory access instruction MA in which addresses AD are in ascending or descending order, there is a continuous address load instruction LD such as a stride access instruction or a continuous address vector store instruction ST.

以上、この実施形態においても、上述した実施形態と同様の効果を得ることができる。さらに、この実施形態では、メモリアクセス命令ＭＡに含まれるアドレスＡＤが昇順または降順である場合に、縮約アドレスＣＡＤ２を使用してアドレスＡＤの衝突を判定することで、衝突の判定精度を向上することができる。 As described above, in this embodiment as well, the same effects as in the above-described embodiment can be obtained. Furthermore, in this embodiment, when the addresses AD included in the memory access instruction MA are in ascending or descending order, the collision determination accuracy is improved by determining a collision of addresses AD using the contracted address CAD2. be able to.

以上の詳細な説明により、実施形態の特徴点および利点は明らかになるであろう。これは、特許請求の範囲がその精神および権利範囲を逸脱しない範囲で前述のような実施形態の特徴点および利点にまで及ぶことを意図するものである。また、当該技術分野において通常の知識を有する者であれば、あらゆる改良および変更に容易に想到できるはずである。したがって、発明性を有する実施形態の範囲を前述したものに限定する意図はなく、実施形態に開示された範囲に含まれる適当な改良物および均等物に拠ることも可能である。 The features and advantages of the embodiments will become apparent from the above detailed description. It is intended that the appended claims extend to the features and advantages of such embodiments without departing from the spirit and scope thereof. Additionally, all improvements and changes will be readily apparent to those having ordinary knowledge in the relevant technical field. Therefore, it is not intended that the scope of the inventive embodiments be limited to those described above, but suitable modifications and equivalents may be made within the scope disclosed in the embodiments.

１、１Ａ演算処理装置
２、２Ａロードストアキュー
３縮約アドレス生成部
４ペイロード
５、５Ａ一致判定部
８、８Ａアクセス制御部
９データキャッシュ
１０命令キャッシュ
２０デコーダ
３０スケジューラ
４０レジスタファイル
５０ロードストアユニット
５２アドレス計算器
６０、６０Ａ、６０Ｂロードストアキュー
６２、６２Ａ、６２Ｃ縮約アドレス生成部
６４ペイロード
６６、６６Ａ一致判定部
６７、６７Ｃ一致判定回路
７０、７０Ａアクセス制御部
８０Ｌ１データキャッシュ
９０演算ユニット
１００、１００Ａ、１００Ｂ、１００Ｃ演算処理装置
ＡＤ（ＡＤ０－ＡＤ７）アドレス
ＡＥ最終アドレス
ＡＨ先頭アドレス
ＢＥ最終アドレス
ＢＨ先頭アドレス
ＣＡＤ、ＣＡＤ０、ＣＡＤ１、ＣＡＤ２縮約アドレス
ＣＯＬ衝突信号
ＤＴデータ
ＥＮＴエントリ
ＩＣＤ命令コード
ＫＥＹキーアドレス
ＬＤロード命令
ＭＡメモリアクセス命令
ＭＳＫマスクベクトル
ＯＦＳＡ、ＯＦＳＢオフセット
ＳＴストア命令
ＳＴＤストアデータ 1, 1A Arithmetic processing unit 2, 2A Load store queue 3 Contracted address generation unit 4 Payload 5, 5A Match determination unit 8, 8A Access control unit 9 Data cache 10 Instruction cache 20 Decoder 30 Scheduler 40 Register file 50 Load store unit 52 Address calculator 60, 60A, 60B Load store queue 62, 62A, 62C Condensed address generation unit 64 Payload 66, 66A Match determination unit 67, 67C Match determination circuit 70, 70A Access control unit 80 L1 data cache 90 Arithmetic unit 100, 100A, 100B, 100C Processing unit AD (AD0-AD7) Address AE Last address AH Start address BE Last address BH Start address CAD, CAD0, CAD1, CAD2 Condensed address COL Collision signal DT Data ENT Entry ICD Instruction code KEY Key address LD Load instruction MA Memory access instruction MSK Mask vector OFSA, OFSB Offset ST Store instruction STD Store data

Claims

a queue holding memory access instructions including at least one address;
When a memory access instruction includes a plurality of addresses, a contracted address generation unit that contracts bits of the plurality of addresses to generate a contracted address;
a collision determination unit that determines a collision between the contracted address and the address held in the queue;
an access control unit that controls processing of memory access instructions held in the queue based on a determination result by the collision determination unit;
An arithmetic processing unit having:

The arithmetic processing device according to claim 1, wherein the abridged address generation unit stores the generated abridged address in the queue.

The contracted address generation unit generates the contracted address by contracting a plurality of addresses included in the memory access instruction and a single address included in the memory access instruction according to a reduction rule. The arithmetic processing device according to item 2.

The contracted address generation unit generates the contracted address for each of a plurality of address groups obtained by grouping a plurality of addresses included in the memory access instruction,
The arithmetic processing device according to any one of claims 1 to 3, wherein the collision determination unit determines collisions between contracted addresses of the plurality of address groups and addresses held in the queue.

The abbreviated address generation unit includes:
a first reduced address generation unit that reduces bits of the plurality of addresses for each of a plurality of groups divided by a different number or a different method of grouping from the plurality of address groups;
a second abridged address generation unit that abridges bits of a plurality of addresses for each of the plurality of address groups;
storing a first condensed address generated by the first condensed address generation unit in the queue;
The arithmetic processing device according to claim 4, wherein the second abbreviated address generated by the second abbreviated address generation unit is output to the collision determination unit.

The contracted address generation unit generates a contracted address indicating a range of multiple addresses included in the memory access instruction,
The arithmetic processing device according to any one of claims 1 to 3, wherein the collision determination unit determines a collision between an address included in the range indicated by the contracted address and an address held in the queue. .

The abbreviated address generation unit includes:
generating a third contracted address by contracting the bits of the plurality of addresses and a fourth contracted address indicating a range of the plurality of addresses;
holding one or both of the generated third abridged address and fourth abridged address in the queue;
The collision determination unit includes:
If the third condensed address is held in the queue, determining a collision between the third condensed address held in the queue and the third condensed address generated by the condensed address generation unit;
If the fourth condensed address is held in the queue, a collision between the fourth condensed address held in the queue and the fourth condensed address generated by the condensed address generation unit is determined. 6. The arithmetic processing device according to 6.

2. The contracted address generation unit generates a contracted address expressed in three-value logic in which the bit values are all "0", all "1", or undefined at each bit position of the plurality of addresses. The arithmetic processing device according to any one of claims 1 to 7.

The arithmetic processing device according to claim 8, wherein the abbreviated address generation unit generates the abbreviated address by making bits lower than a bit position indicating indeterminate as indeterminate.

The contracted address generation unit generates, as a contracted address, a key address indicating one of the plurality of addresses and a mask vector expressed by an exclusive OR of bit values at each bit position of the plurality of addresses. The arithmetic processing device according to claim 8 or 9.

The collision determination unit includes a plurality of collision determination circuits that respectively determine collisions between the contracted address and the plurality of addresses held in the queue,
Each of the plurality of collision determination circuits includes:
a negative exclusive OR circuit that calculates a negative exclusive OR between bits of a key address included in the contracted address and the contracted address held in the queue;
a first OR circuit that calculates an OR between bits of a mask vector included in the contracted address and the contracted address held in the queue;
a second OR circuit that calculates the bitwise OR of the output of the negative exclusive OR circuit and the output of the first OR circuit;
an AND circuit that calculates an AND of all bits of the output of the second OR circuit;
The arithmetic processing device according to claim 10, wherein an address collision is detected when the output of the AND circuit is "1".

An arithmetic processing method for an arithmetic processing device having a queue holding memory access instructions including at least one address, the method comprising:
When a memory access instruction includes a plurality of addresses, a contracted address generation unit included in the arithmetic processing device contracts bits of the plurality of addresses to generate a contracted address;
a collision determination unit included in the arithmetic processing device determines a collision between the contracted address and the address held in the queue;
An arithmetic processing method, wherein an access control unit included in the arithmetic processing device controls processing of memory access instructions held in the queue based on a determination result by the collision determination unit.