CN1908927A - Reconfigurable integrated circuit device - Google Patents

Reconfigurable integrated circuit device Download PDF

Info

Publication number
CN1908927A
CN1908927A CNA2006100083495A CN200610008349A CN1908927A CN 1908927 A CN1908927 A CN 1908927A CN A2006100083495 A CNA2006100083495 A CN A2006100083495A CN 200610008349 A CN200610008349 A CN 200610008349A CN 1908927 A CN1908927 A CN 1908927A
Authority
CN
China
Prior art keywords
memory
processor element
data
access
integrated circuit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006100083495A
Other languages
Chinese (zh)
Other versions
CN100414535C (en
Inventor
笠间一郎
鹤田徹
西田克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cypress Semiconductor Corp
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Publication of CN1908927A publication Critical patent/CN1908927A/en
Application granted granted Critical
Publication of CN100414535C publication Critical patent/CN100414535C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Multi Processors (AREA)
  • Microcomputers (AREA)

Abstract

A reconfigurable integrated circuit device which is dynamically constructed to be an arbitrary operation status based on a configuration data, has a plurality of clusters including operation processor elements, a memory processor element, and an inter-processor element switch group for connecting the elements in an arbitrary status; an inter-cluster switch group for constructing data paths between the clusters in an arbitrary status; and an external memory bus. A direct memory access control section, for executing the data transfer between the memory processor element and the external memory by direct memory access responding to an access request from the memory processor elements of the plurality of clusters, is further provided.

Description

可重配置的集成电路器件reconfigurable integrated circuit device

技术领域technical field

本发明涉及可重配置的集成电路器件,更具体地说,涉及被安装在可重配置集成电路器件中的内部存储器的新颖配置,用于执行与外部存储器之间的数据传输。The present invention relates to reconfigurable integrated circuit devices, and more particularly, to a novel configuration of an internal memory mounted in a reconfigurable integrated circuit device for performing data transfer to and from an external memory.

背景技术Background technique

可重配置集成电路器件包括多个处理器元件和用于互连这些处理器元件的网络,其中定序器响应于外部或内部事件来向处理器元件和网络提供配置数据,并根据该配置数据,利用处理器元件和网络来配置任意运算状态或运算电路。传统的可编程微处理器顺序地读取存储在存储器中的指令,并顺序地处理它们。由于一个处理器同时执行的指令数是有限的,因此微处理器的处理能力也受到某种限制。A reconfigurable integrated circuit device includes a plurality of processor elements and a network for interconnecting these processor elements, wherein the sequencer provides configuration data to the processor elements and the network in response to external or internal events, and according to the configuration data , using processor elements and networks to configure arbitrary computing states or computing circuits. A conventional programmable microprocessor reads instructions stored in memory sequentially and processes them sequentially. Since the number of instructions that a processor can execute at the same time is limited, the processing power of the microprocessor is also limited to some extent.

另一方面,在最近提出的可重配置集成电路器件中,具有加法器、乘法器、比较器等功能的ALU和例如延迟电路、计数器等多种处理器元件被预先安装,并且用于连接这些处理器元件的网络也被安装,然后,根据从具有定序器的状态转换控制部件而来的配置数据,所述多个处理器元件和网络被重新配置为所需配置,而且在该运算状态下执行预定的运算。当在一种运算状态下的数据处理完成时,根据其他配置数据来构造另一种运算状态,而且在该状态下执行不同的数据处理。On the other hand, in recently proposed reconfigurable integrated circuit devices, ALUs with functions such as adders, multipliers, comparators, etc., and various processor elements such as delay circuits, counters, etc. are preinstalled and used to connect these A network of processor elements is also installed, and then, based on configuration data from a state transition control unit with a sequencer, the plurality of processor elements and network are reconfigured to the desired configuration, and in the operational state Next, execute the predetermined operation. When data processing in one operation state is completed, another operation state is constructed based on other configuration data, and different data processing is performed in this state.

通过以此方式动态地构造不同运算状态,可提高对大量数据的数据处理能力,并且可提高整体处理效率。这种可重配置集成电路器件例如在日本专利申请早期公开No.2001-312481中公开。By dynamically constructing different operation states in this way, the data processing capability for a large amount of data can be improved, and the overall processing efficiency can be improved. Such a reconfigurable integrated circuit device is disclosed, for example, in Japanese Patent Application Laid-Open No. 2001-312481.

发明内容Contents of the invention

在传统的可重配置集成电路器件中,多个处理器元件的阵列被连接在处理器之间的开关包围,状态转换控制部件向处理器元件和开关组提供配置数据,以设置任意运算状态。在处理器元件组中,数据从外部存储器输入,被设置为运算状态的处理器元件组对输入数据执行预定数据处理,如此获得的数据被输出。In a conventional reconfigurable integrated circuit device, an array of multiple processor elements is surrounded by switches connected between the processors, and a state transition control unit provides configuration data to the processor elements and switch banks to set arbitrary computing states. In the group of processor elements, data is input from an external memory, the group of processor elements set in an operation state performs predetermined data processing on the input data, and the data thus obtained is output.

在上述集成电路器件中,数据处理所需的数据从外部存储器被成批读取,并被存储在内部存储器中,然后被设置为某种运算状态的处理器元件组和开关组对读取的所有数据执行数据处理。In the above-mentioned integrated circuit device, the data required for data processing is read in batches from the external memory and stored in the internal memory, and then set to a certain operation state of the processor element group and the switch group for reading All data performs data processing.

但是,可重配置的集成电路器件利用动态配置的预定数量的处理器元件执行不同的应用。因此,每个处理器元件需要在所需的定时上向外部存储器写或从外部存储器读取所需数量的数据。在现有技术中,经由使用连接处理器元件的开关组的数据路径来传输数据,并且仅能在预定的定时上与外部存储器进行数据传输。However, reconfigurable integrated circuit devices utilize a dynamically configured predetermined number of processor elements to execute different applications. Therefore, each processor element needs to write or read a required amount of data to or from the external memory at a required timing. In the prior art, data is transferred via a data path using a switch group connecting processor elements, and data transfer with an external memory is only possible at predetermined timing.

此外,用于存储从外部存储器读取的数据或要被写到外部存储器的数据的预定数量的内部存储器被安装用于多个处理器元件,但是将由用户配置的运算状态是可变的,因此很难估计需要多少个内部存储器以及内部存储器需要何种输入/输出特性。因此在可重配置集成电路器件中,内部存储器的配置和操作需要很高的灵活度。In addition, a predetermined number of internal memories for storing data read from or to be written to an external memory are mounted for a plurality of processor elements, but the operation state to be configured by the user is variable, so It is difficult to estimate how much internal memory is required and what kind of input/output characteristics the internal memory requires. Therefore, in the reconfigurable integrated circuit device, the configuration and operation of the internal memory need a high degree of flexibility.

鉴于上述原因,本发明的目的在于提供一种可重配置的集成电路器件,其允许内部存储器的高度灵活的配置和操作。In view of the above reasons, it is an object of the present invention to provide a reconfigurable integrated circuit device which allows highly flexible configuration and operation of the internal memory.

为了达到此目的,本发明的第一方面是一种可重配置的集成电路器件,该器件基于配置数据被动态构建为任意运算状态,该器件包括:多个群集,所述群集包括多个分别具有计算单元的运算处理器元件、与外部存储器之间进行数据传输的具有存储器的存储器处理器元件、以及用于在任意状态下连接运算处理器元件和存储器处理器元件的处理器元件间开关组;群集间开关组,用于在任意状态下构建群集之间的数据路径;以及外部存储器总线,用于执行存储器处理器元件和外部存储器之间的数据传输,其中所述运算处理器元件、存储器处理器元件、处理器元件间开关组和群集间开关组基于配置数据而被动态改变,此外还提供了直接存储器访问控制部件,其响应于从多个群集的存储器处理器元件而来的访问请求,通过直接存储器访问来执行存储器处理器元件和外部存储器之间的数据传输。To this end, a first aspect of the present invention is a reconfigurable integrated circuit device that is dynamically constructed into an arbitrary operational state based on configuration data, the device comprising: a plurality of clusters comprising a plurality of Arithmetic processor element with calculation unit, memory processor element with memory for data transfer with external memory, and interprocessor element switch group for connecting arithmetic processor element and memory processor element in any state an inter-cluster switch group for constructing a data path between clusters in any state; and an external memory bus for performing data transfers between a memory processor element and an external memory, wherein the arithmetic processor element, the memory processor elements, inter-processor element switch groups, and inter-cluster switch groups are dynamically changed based on configuration data, and a direct memory access control component is provided that responds to access requests from multiple clusters of memory processor elements , data transfers between the memory processor element and the external memory are performed by direct memory access.

根据第一方面,安装在群集中的存储器处理器元件可经由与群集间开关组不同的外部存储器总线,通过直接存储器访问与外部存储器之间进行数据传输,而且可以在适于重配置后的运算状态的定时上,对外部存储器中的数据执行重配置后的运算。According to the first aspect, the memory processor elements installed in the cluster can transfer data to and from the external memory through direct memory access via an external memory bus different from the inter-cluster switch group, and can perform operation after reconfiguration At the timing of the state, the reconfigured operation is performed on the data in the external memory.

在本发明的第一方面中,优选地,所述群集还包括用于存储所述配置数据的配置数据存储器,以及定序器,所述定序器响应于从所述运算处理器元件和存储器处理器元件而来的结束信号,从所述配置数据存储器输出用于构建下一运算状态的配置数据。In the first aspect of the present invention, preferably, said cluster further comprises a configuration data memory for storing said configuration data, and a sequencer responsive to The end signal from the processor element outputs the configuration data for constructing the next operation state from the configuration data memory.

在本发明的第一方面中,优选地,所述可重配置的集成电路器件还包括数据流控制部件,该数据流控制部件被安装为所述多个存储器处理器元件的公用部件,用于接受来自所述多个存储器处理器元件的直接存储器访问请求,并向用于所述多个存储器处理器元件的直接存储器访问控制部件指示同步的直接存储器访问请求。In the first aspect of the present invention, preferably, the reconfigurable integrated circuit device further includes a data flow control unit installed as a common unit of the plurality of memory processor elements for Direct memory access requests from the plurality of memory processor elements are accepted and simultaneous direct memory access requests are indicated to direct memory access control means for the plurality of memory processor elements.

在第一方面中,优选地,所述可重配置的集成电路器件还包括数据流控制部件,该数据流控制部件被安装为所述多个存储器处理器元件的公用部件,用于接受来自所述多个存储器处理器元件的直接存储器访问请求,并向用于所述多个存储器处理器元件的直接存储器访问控制部件指示同步的直接存储器访问请求。通过该数据流控制部件,来自所述多个存储器处理器元件的访问请求可被同步执行。In the first aspect, preferably, the reconfigurable integrated circuit device further includes a data flow control unit installed as a common unit of the plurality of memory processor elements for receiving data from all direct memory access requests of the plurality of memory processor elements, and indicate simultaneous direct memory access requests to direct memory access control means for the plurality of memory processor elements. Through the data flow control section, access requests from the plurality of memory processor elements can be executed synchronously.

在第一方面中,所述存储器处理器元件还包括与连接到所述处理器元件间开关组的内部总线之间的内侧接口,以及与所述外部存储器总线之间的外侧接口,其中在所述存储器处理器元件经由所述外侧接口通过直接存储器访问来访问所述外部存储器的同时,所述运算处理器元件经由内侧接口来访问存储器处理器元件。根据该方面,可无缝地在外部存储器和运算处理器元件之间进行数据传输。In the first aspect, said memory processor element further comprises an inner interface to an internal bus connected to said inter-processor element switch bank, and an outer interface to said external memory bus, wherein said While the memory processor element accesses the external memory by direct memory access via the external interface, the arithmetic processor element accesses the memory processor element via the internal interface. According to this aspect, data transfer between the external memory and the arithmetic processor element can be seamlessly performed.

在第一方面中,同样优选地,存储器处理器元件在通过直接存储器访问与外部存储器之间进行数据传输的同时,接受与运算处理器元件之间的数据传输,当通过直接存储器访问的数据传输跟不上与运算处理器元件之间的数据传输时断言(assert)一个停顿(stall)信号,以停止所述多个运算处理器元件的运算,并且在能够跟上时取消所述停顿信号。根据该方面,当不能在所述外部存储器和所述运算处理器元件之间进行无缝数据传输时,运算处理器元件的运算可被停止,以避免误操作。In the first aspect, it is also preferred that the memory processor element accepts data transfer with the arithmetic processor element while performing data transfer with the external memory by direct memory access, when the data transfer by direct memory access asserting a stall signal to stop operations of the plurality of arithmetic processor elements when unable to keep up with data transfers to and from the arithmetic processor elements, and deasserting the stall signal when able to catch up. According to this aspect, when seamless data transfer cannot be performed between the external memory and the arithmetic processor element, the arithmetic processor element's operation may be stopped to avoid erroneous operations.

为了达到该目的,本发明的第二方面是一种可重配置的集成电路器件,该器件基于配置数据被动态配置为预定运算状态,该器件包括:多个群集,所述群集包括具有计算单元的运算处理器元件、与外部存储器之间进行数据传输的具有存储器的存储器处理器元件、以及用于在任意状态下连接运算处理器元件和存储器处理器元件的处理器元件间开关组;群集间开关组,用于在任意状态下构建群集之间的数据路径;以及外部存储器总线,用于执行存储器处理器元件和外部存储器之间的数据传输,其中所述运算处理器元件、存储器处理器元件、处理器元件间开关组和群集间开关组基于配置数据而被动态改变,此外还提供了直接存储器访问控制部件,其响应于从多个群集的存储器处理器元件而来的访问请求,通过直接存储器访问来执行存储器处理器元件和外部存储器之间的数据传输,所述存储器处理器元件包括第一和第二存储器库,其中当所述第一和第二存储器库中的一个正在通过直接存储器访问与外部存储器进行数据传输时,所述第一和第二存储器库中的另一个与运算处理器元件进行数据传输。To achieve this object, a second aspect of the present invention is a reconfigurable integrated circuit device that is dynamically configured to a predetermined operational state based on configuration data, the device comprising: a plurality of clusters, the clusters comprising computing units The operation processor element of the computer, the memory processor element with memory for data transmission between the external memory, and the inter-processor element switch group for connecting the operation processor element and the memory processor element in any state; inter-cluster a switch bank for constructing a data path between clusters in an arbitrary state; and an external memory bus for performing data transfer between a memory processor element and an external memory, wherein the arithmetic processor element, the memory processor element , inter-processor element switch groups and inter-cluster switch groups are dynamically changed based on configuration data, and a direct memory access control unit is provided that responds to access requests from memory processor elements of multiple clusters through direct memory access to perform a data transfer between a memory processor element and an external memory, the memory processor element including first and second memory banks, wherein when one of the first and second memory banks is passing through the direct memory When accessing and performing data transmission with the external memory, the other one of the first and second memory banks performs data transmission with the arithmetic processor element.

根据第二方面,可经由不同于所述群集间开关组的外部存储器总线,在任意定时上执行所述外部存储器和所述运算处理器元件之间的无缝数据传输。According to the second aspect, seamless data transfer between the external memory and the arithmetic processor element can be performed at arbitrary timing via an external memory bus different from the inter-cluster switch group.

根据本发明,安装在每个群集中的存储器处理器元件使得可独立于群集之间的数据路径,通过对外部存储器的直接存储器访问实现数据传输,从而增加向可重配置集成电路器件中的存储器处理器元件进行数据传输的灵活性,并且可以高效地完成数据传输。According to the present invention, the memory processor element installed in each cluster enables data transfer by direct memory access to external memory independent of the data path between the clusters, thereby increasing the memory capacity in reconfigurable integrated circuit devices. The flexibility of the processor element to perform data transfers, and the data transfers can be done efficiently.

附图说明Description of drawings

图1是描述了构成根据本实施例的可重配置集成电路器件的一部分的一个群集(cluster)的框图;FIG. 1 is a block diagram illustrating a cluster constituting a part of the reconfigurable integrated circuit device according to the present embodiment;

图2是描述了根据本实施例的PE网络部件的配置示例的示意图;FIG. 2 is a schematic diagram illustrating a configuration example of a PE network element according to the present embodiment;

图3是描述了根据本实施例的根据PE网络部件的配置数据配置的电路的配置示例的示意图;3 is a schematic diagram illustrating a configuration example of a circuit configured according to configuration data of a PE network element according to the present embodiment;

图4是描述了根据本实施例的根据PE网络部件的配置数据配置的电路的配置示例的示意图;4 is a schematic diagram illustrating a configuration example of a circuit configured according to configuration data of a PE network element according to the present embodiment;

图5是描述了根据本实施例的可重配置集成电路器件的框图;FIG. 5 is a block diagram illustrating a reconfigurable integrated circuit device according to the present embodiment;

图6是描述了根据本实施例的存储器处理器元件的示例的框图;FIG. 6 is a block diagram illustrating an example of a memory processor element according to the present embodiment;

图7A-7C是描述了根据本实施例的存储器处理器元件中的两个存储器库(memory bank)的切换操作的示意图;7A-7C are schematic diagrams describing switching operations of two memory banks (memory banks) in the memory processor element according to the present embodiment;

图8A-8C是描述了根据本实施例的存储器处理器元件中的两个存储器库的切换操作的示意图;8A-8C are schematic diagrams describing switching operations of two memory banks in the memory processor element according to the present embodiment;

图9A-9C是描述了根据本实施例的存储器处理器元件中的两个存储器库的切换操作的示意图;9A-9C are schematic diagrams describing switching operations of two memory banks in the memory processor element according to the present embodiment;

图10A-10C是描述了根据本实施例的存储器处理器元件中的两个存储器库的切换操作的示意图;10A-10C are schematic diagrams describing switching operations of two memory banks in the memory processor element according to the present embodiment;

图11A-11C是描述了根据本实施例的存储器处理器元件中的两个存储器库的切换操作的示意图;11A-11C are schematic diagrams describing switching operations of two memory banks in the memory processor element according to the present embodiment;

图12是描述了根据本实施例的存储器处理器元件的控制部件的框图;FIG. 12 is a block diagram illustrating a control section of the memory processor element according to the present embodiment;

图13是根据本实施例的存储器处理器元件的控制部件的状态转换图;FIG. 13 is a state transition diagram of the control part of the memory processor element according to the present embodiment;

图14A-14B是描述了访问结束寄存器的标志改变控制的示意图;14A-14B are schematic diagrams describing the flag change control of the access end register;

图15A-15B是描述了存储器PE中的外侧接口的示意图;以及15A-15B are schematic diagrams describing the external interface in the memory PE; and

图16是描述了存储器PE中的外侧接口的示意图。FIG. 16 is a schematic diagram describing the external interface in the memory PE.

具体实施方式Detailed ways

现在参照附图描述本发明的实施例。但是,本发明的技术范围将不局限于这些实施例,而是延伸到权利要求及其等同物的内容。Embodiments of the present invention will now be described with reference to the accompanying drawings. However, the technical scope of the present invention shall not be limited to these embodiments, but extend to the contents of the claims and their equivalents.

图1是构成了根据本实施例的可重配置集成电路器件的一部分的一个群集的框图。群集10包括:定序器SEQ,用于执行状态管理;配置数据存储器14,用于存储配置数据CD;以及将根据配置数据CD而被配置为任意电路配置的处理器元件网络部件16。在配置数据存储器14中,配置数据CD是从配置数据加载部件(未示出)加载的。FIG. 1 is a block diagram of a cluster constituting a part of the reconfigurable integrated circuit device according to the present embodiment. The cluster 10 comprises: a sequencer SEQ for performing state management; a configuration data store 14 for storing configuration data CD; and a processor element network part 16 to be configured in an arbitrary circuit configuration according to the configuration data CD. In the configuration data storage 14, the configuration data CD is loaded from a configuration data loading section (not shown).

处理器元件网络部件16包括:多个处理器元件(此后常称为PE)PE0-PE5;PE间开关20,这一组开关是用于连接PE的选择器;以及输入端口部件22和输出端口部件24,它们是与其他群集之间进行数据传输的接口。输入端口部件22和输出端口部件24连接到群集间开关组30。根据图1中的示例,处理器元件PE0-PE3都是运算PE,并且每一个的内部具有ALU、加法器、比较器。处理器元件PE4是另一个PE,例如延迟电路或计数器,而处理器元件PE5是内部具有RAM的存储器PE。The processor element network part 16 includes: a plurality of processor elements (hereinafter often referred to as PEs) PE0-PE5; a switch 20 among PEs, and this group of switches is a selector for connecting PEs; and an input port part 22 and an output port Components 24, which are interfaces for data transmission with other clusters. The input port section 22 and the output port section 24 are connected to an intercluster switch group 30 . According to the example in Fig. 1, the processor elements PE0-PE3 are all arithmetic PEs, and each has an ALU, an adder, a comparator inside. The processor element PE4 is another PE, such as a delay circuit or a counter, and the processor element PE5 is a memory PE with RAM inside.

配置数据CD0-CD5从配置数据存储器14被提供给处理器元件PE0-PE5,并且配置数据被存储在这些PE中的寄存器(未示出)中。基于在这些寄存器中设置的配置数据CD0-CD5,每个PE中的电路被动态地配置。同样地,配置数据CD还从配置数据存储器14被提供到PE间开关组20,而且基于该数据,所需的内部开关组结构被配置并且PE之间的数据路径被动态配置。群集间开关组30也基于配置数据CD被动态配置,而且群集之间的数据路径也被配置。Configuration data CD0-CD5 are provided to processor elements PE0-PE5 from a configuration data store 14 and the configuration data are stored in registers (not shown) in these PEs. The circuits in each PE are dynamically configured based on the configuration data CD0-CD5 set in these registers. Likewise, configuration data CD is also provided from the configuration data store 14 to the inter-PE switch bank 20 and based on this data the required internal switch bank structure is configured and the data paths between the PEs are dynamically configured. The inter-cluster switch group 30 is also dynamically configured based on the configuration data CD, and the data paths between the clusters are also configured.

群集中的存储器处理器元件PE5可经由PE间开关组20与PE0-PE4中每一个进行数据传输。因此,存储器处理器元件PE5连接到内部总线I-BUS。存储器处理器元件PE5可经由外部总线E-BUS1和E-BUS2与外部存储器E-MEM直接进行数据传输,该存储器访问是通过直接存储器访问控制部件DMAC的控制,经由与群集间开关组30不同的总线而直接进行的。因此,存储器处理器元件PE5可与外部存储器E-MEM直接进行数据传输,而且可以在与群集间的数据路径操作无关的定时上进行数据传输。The memory processor element PE5 in the cluster can perform data transmission with each of PE0-PE4 via the inter-PE switch group 20 . Therefore, the memory processor element PE5 is connected to the internal bus I-BUS. The memory processor element PE5 can directly perform data transmission with the external memory E-MEM via the external buses E-BUS1 and E-BUS2. directly on the bus. Therefore, the memory processor element PE5 can directly perform data transfer with the external memory E-MEM, and also can perform data transfer at a timing independent of the inter-cluster data path operation.

每个结束信号CS0-CS5分别从每个处理器元件PE0-PE5输出,切换信号生成部件12基于这些结束信号输出切换信号SW1。响应于该切换信号SW1,定序器SEQ输出新地址Add和切换信号SW2到配置数据存储器14,响应于此,新配置数据被输出,PE网络部件16中的电路配置被重新配置。Each end signal CS0-CS5 is output from each processor element PE0-PE5 respectively, and the switching signal generating section 12 outputs the switching signal SW1 based on these end signals. In response to the switching signal SW1, the sequencer SEQ outputs a new address Add and a switching signal SW2 to the configuration data memory 14, and in response thereto, new configuration data is output and the circuit configuration in the PE network part 16 is reconfigured.

图2是示出了根据本实施例的PE网络部件的配置示例的示意图。运算处理器元件PE0-PE3、存储器处理器元件PE5和其他处理器元件PE4可经由选择器41(PE间开关组20中的一个开关)连接。在该配置中,每个处理器元件PE0-PE5可基于配置数据CD0-CD5被配置为任意一种配置,PE间开关组20的选择器41也可基于配置数据CD被配置为任意一种配置。FIG. 2 is a schematic diagram showing a configuration example of PE network components according to the present embodiment. The arithmetic processor elements PE0-PE3, the memory processor element PE5, and the other processor element PE4 can be connected via a selector 41 (one switch in the inter-PE switch group 20). In this configuration, each processor element PE0-PE5 can be configured as any configuration based on the configuration data CD0-CD5, and the selector 41 of the inter-PE switch group 20 can also be configured as any configuration based on the configuration data CD .

如图2右下角所例示的那样,选择器41包括:寄存器42,用于存储配置数据CD;选择器电路43,用于根据寄存器42的数据来选择输入;以及触发器44,其与时钟CK同步地锁存选择器电路43的输出。As illustrated in the lower right corner of Figure 2, the selector 41 includes: a register 42 for storing configuration data CD; a selector circuit 43 for selecting an input according to the data of the register 42; and a flip-flop 44 connected to the clock CK The output of the selector circuit 43 is latched synchronously.

图3和图4是根据本实施例描述了根据PE网络部件的配置数据配置的电路配置示例的示意图。在图3和图4中,可动态配置运算电路的运算处理器元件PE0-PE3和PE6被PE间开关组20连接,并且被配置为高速执行预定运算的专用运算电路。处理器元件PE6未在图1和图2中示出。FIG. 3 and FIG. 4 are schematic diagrams illustrating circuit configuration examples configured according to configuration data of PE network components according to the present embodiment. In FIGS. 3 and 4 , the arithmetic processor elements PE0-PE3 and PE6 of the dynamically configurable arithmetic circuit are connected by an inter-PE switch group 20 and configured as a dedicated arithmetic circuit performing predetermined arithmetic at high speed. The processor element PE6 is not shown in FIGS. 1 and 2 .

图3中的示例是当对输入数据a、b、c、d、e和f执行下列算术表达式的专用运算电路被配置时的示例。The example in FIG. 3 is an example when a dedicated operation circuit that performs the following arithmetic expressions on input data a, b, c, d, e, and f is configured.

(a+b)+(c-d)+(e+f)(a+b)+(c-d)+(e+f)

根据该配置的示例,处理器元件PE0被配置为A=a+b运算电路,处理器元件PE1被配置为B=c-d运算电路,处理器元件PE2被配置为C=e+f运算电路,处理器元件PE3被配置为D=A+B运算电路,处理器元件PE6被配置为E=D+C运算电路。数据a~f中的每一个从存储器处理器元件和外部群集(未示出)被提供,处理器元件PE6的输出作为运算结果E被输出到存储器处理器元件和外部群集。According to an example of this configuration, the processor element PE0 is configured as an A=a+b operation circuit, the processor element PE1 is configured as a B=c-d operation circuit, the processor element PE2 is configured as a C=e+f operation circuit, and the processing The processor element PE3 is configured as a D=A+B arithmetic circuit, and the processor element PE6 is configured as an E=D+C arithmetic circuit. Each of the data a to f is supplied from a memory processor element and an external cluster (not shown), and an output of the processor element PE6 is output as an operation result E to the memory processor element and the external cluster.

处理器元件PE0、PE1和PE2并行执行运算,处理器元件PE3对上面的运算结果执行运算D=A+B,最后处理器元件PE6执行运算E=D+C。以此方式,通过配置专用运算电路实现了并行运算,从而提高了运算处理效率。The processor elements PE0, PE1 and PE2 perform operations in parallel, the processor element PE3 performs the operation D=A+B on the above operation results, and finally the processor element PE6 performs the operation E=D+C. In this way, the parallel operation is realized by configuring the dedicated operation circuit, thereby improving the operation processing efficiency.

每个运算处理器元件都具有内建的ALU、加法器、乘法器和比较器,并且可基于配置数据CD被重配置为任意运算电路。通过如图3所示进行配置,可配置用于执行上述专用运算的专用运算电路。并且通过配置这样的专用运算电路,多个运算可被并行执行,从而可提高运算效率。Each arithmetic processor element has a built-in ALU, adder, multiplier, and comparator, and can be reconfigured into an arbitrary arithmetic circuit based on configuration data CD. By configuring as shown in FIG. 3, it is possible to configure a dedicated operation circuit for performing the above-mentioned dedicated calculation. And by configuring such a dedicated computing circuit, multiple operations can be executed in parallel, thereby improving computing efficiency.

图4的示例是当对输入数据a~d执行(a+b)*(c-d)运算的专用运算电路被配置时的示例。处理器元件PE0被配置为A=a+b运算电路,处理器元件PE1被配置为B=c-d运算电路,处理器元件PE3被配置为C=A*B运算电路,运算结果C被输出到存储器处理器元件或外部群集。在此情形下,同样地,处理器元件PE0和PE1并行执行运算,处理器元件PE3对其运算结果A和B执行运算C=A*B。因此,通过配置专用运算电路,上述运算效率可被提高,而且对大量数据的运算效率也可提高。The example of FIG. 4 is an example when a dedicated operation circuit performing (a+b)*(c-d) operation on input data a to d is configured. The processor element PE0 is configured as an A=a+b operation circuit, the processor element PE1 is configured as a B=c-d operation circuit, and the processor element PE3 is configured as a C=A*B operation circuit, and the operation result C is output to the memory Processor elements or external clusters. In this case, too, the processor elements PE0 and PE1 perform operations in parallel, and the processor element PE3 performs the operation C=A*B on its operation results A and B. Therefore, by configuring a dedicated operation circuit, the above-mentioned operation efficiency can be improved, and also operation efficiency for a large amount of data can be improved.

图5是描述根据本实施例的可重配置集成电路器件的框图。在图5中,安装了多个群集CLS0-CLS3,用于连接这些群集的群集间开关组30被安置在这些群集之间。通过根据配置数据CD来配置该群集间开关组30,可动态地配置一个组合了多个群集的任意运算电路。FIG. 5 is a block diagram illustrating a reconfigurable integrated circuit device according to the present embodiment. In FIG. 5, a plurality of clusters CLS0-CLS3 are installed, and inter-cluster switch groups 30 for connecting these clusters are arranged between these clusters. By configuring the inter-cluster switch group 30 according to the configuration data CD, an arbitrary arithmetic circuit combining a plurality of clusters can be dynamically configured.

在图5的示例中,存储器处理器元件PE-RAM被安装在群集CLS0-CLS3的每一个中。在一个群集中,可根据情况安装多个存储器处理器元件或不安装存储器处理器元件。这些存储器处理器元件经由外部总线E-BUS1连接到直接访问控制部件DMAC,并经由访问控制部件DMAC通过直接存储器访问来执行与外部存储器E-MEM之间的数据传输。关于外部存储器E-MEM,例如DDR-SDRAM(双数据率同步DRAM)被用作高速存储器的示例。此外,安装一个公共数据流控制部件40用于多个存储器处理器元件PE-RAM。每个存储器处理器元件发出访问请求DR0-DR3,响应于该访问请求,数据流控制部件40发送访问命令到控制部件DMAC,从而通过DMA与发送了访问请求的存储器处理器元件执行数据传输。In the example of FIG. 5, memory processor elements PE-RAM are installed in each of the clusters CLS0-CLS3. In a cluster, multiple or no memory processor elements may be installed as appropriate. These memory processor elements are connected to the direct access control section DMAC via the external bus E-BUS1, and perform data transfer with the external memory E-MEM by direct memory access via the access control section DMAC. As for the external memory E-MEM, for example, DDR-SDRAM (Double Data Rate Synchronous DRAM) is used as an example of a high-speed memory. In addition, a common data flow control section 40 is installed for a plurality of memory processor elements PE-RAM. Each memory processor element issues an access request DR0-DR3, and in response to the access request, the data flow control section 40 sends an access command to the control section DMAC, thereby performing data transfer by DMA with the memory processor element that sent the access request.

数据流控制部件40接受来自多个存储器处理器元件的访问请求,并同步地执行多个存储器处理器元件和外部存储器之间的DMA数据传输。换言之,访问控制部件DMAC基于来自数据流控制部件40的访问命令ACMD,通过轮转方式(round-robin)来同步执行与多个存储器处理器元件之间的DMA数据传输。The data flow control section 40 accepts access requests from a plurality of memory processor elements, and synchronously performs DMA data transfer between the plurality of memory processor elements and an external memory. In other words, the access control section DMAC synchronously performs DMA data transfer with a plurality of memory processor elements in a round-robin manner based on the access command ACMD from the data flow control section 40 .

以此方式,群集中的存储器处理器元件以DMA方式从外部存储器E-MEM传输数据,该数据将被利用群集中的运算处理器元件配置的运算电路处理,并将处理后的数据以DMA方式传输到外部存储器E-MEM。这种DMA方式的传输由外部总线E-BUS1和E-BUS2直接执行,所述外部总线独立于用于连接群集的群集间开关组30。因此,在可重配置集成电路器件中,即使群集间开关组30的连接结构是动态改变的,也可以在每个存储器处理器元件所需的定时上,经由独立于群集间开关组30的路径来在每个存储器处理器元件和外部存储器之间进行数据传输,并且可以为动态配置的群集或者为多个群集实现最优数据传输。In this way, the memory processor element in the cluster transfers data from the external memory E-MEM by DMA, the data will be processed by the arithmetic circuit configured by the arithmetic processor element in the cluster, and the processed data is DMAed Transfer to external memory E-MEM. This DMA transfer is performed directly by the external buses E-BUS1 and E-BUS2 independent of the intercluster switch group 30 for connecting the clusters. Therefore, in the reconfigurable integrated circuit device, even if the connection structure of the inter-cluster switch group 30 is dynamically changed, it is possible to pass through a path independent of the inter-cluster switch group 30 at the timing required by each memory processor element. to perform data transfers between each memory processor element and external memory, and can implement optimal data transfers for dynamically configured clusters or for multiple clusters.

图6是描述了根据本实施例的存储器处理器元件的示例的框图。为了实现在外部存储器和群集中的运算处理器元件之间的无缝数据传输,存储器处理器元件包括第一存储器库BNK0和第二存储器库BNK1,还包括这些存储器库和PE间开关组20之间的内侧接口50,以及这些存储器库和外部总线E-BUS1之间的外侧接口52。存储器库BNK0和BNK1分别包括四个16位宽RAM。内侧接口50连接到与PE间开关组20相连接的内部总线I-BUS,基于配置数据CD被动态配置为不同的输入/输出总线接口结构。外侧接口52连接到外部总线E-BUS1,并且也基于配置数据CD而被动态配置为不同的输入/输出总线接口结构。有关将被配置的输入/输出总线接口结构的细节将在后面描述。FIG. 6 is a block diagram illustrating an example of a memory processor element according to the present embodiment. In order to realize the seamless data transfer between the external memory and the arithmetic processor element in the cluster, the memory processor element includes a first memory bank BNK0 and a second memory bank BNK1, and also includes between these memory banks and the inter-PE switch group 20 The inner interface 50 between these memory banks and the outer interface 52 between these memory banks and the external bus E-BUS1. Memory banks BNK0 and BNK1 each include four 16-bit wide RAMs. The inner interface 50 is connected to the internal bus I-BUS connected to the inter-PE switch group 20, and is dynamically configured into different input/output bus interface structures based on the configuration data CD. The outside interface 52 is connected to the external bus E-BUS1, and is also dynamically configured into different input/output bus interface structures based on the configuration data CD. Details about the structure of the input/output bus interface to be configured will be described later.

在第一存储器库BNK0和第二存储器库BNK1中,当一个存储器库正在与内部运算处理器元件PE/ALU进行数据传输时,另一个则与外部存储器E-MEM进行数据传输,而且两个存储器库还可以交替执行数据传输。因此,选择器SEL被安装在存储器库BNK0、BNK1与内侧接口50、外侧接口52之间,这些选择器SEL根据配置数据CD被设置。于是,第一和第二存储器库可被交替连接到内侧和外侧接口。接口50和52与每个存储器库BNK0和BNK1之间的信号线都包括16位数据线、地址线和所有其他必要的控制线。In the first memory bank BNK0 and the second memory bank BNK1, when one memory bank is performing data transmission with the internal arithmetic processor element PE/ALU, the other is performing data transmission with the external memory E-MEM, and the two memory banks The library can also alternately perform data transfers. Therefore, selectors SEL are installed between the memory banks BNK0, BNK1 and the inner interface 50, the outer interface 52, and these selectors SEL are set according to the configuration data CD. Thus, the first and second memory banks may be alternately connected to the inner and outer interfaces. The signal lines between interfaces 50 and 52 and each memory bank BNK0 and BNK1 include 16-bit data lines, address lines and all other necessary control lines.

存储器处理器元件内部包括:存储器控制部件54,用于控制存储器库的切换和控制DMA请求;以及运算控制部件56,用于执行对内部运算处理器元件PE/ALU的运算执行控制。存储器控制部件54监视存储器库的状态,并执行对存储器库的切换控制、DMA请求、以及对用于停止运算处理器元件的操作的停顿信号STR的断言和取消,从而实现外部存储器和内部运算处理器元件之间的无缝数据传输。响应于该停顿信号STR,运算控制部件56控制运算处理器元件操作的开始和停止。The memory processor element includes: a memory control unit 54 for controlling switching of memory banks and controlling DMA requests; and an operation control unit 56 for performing operation control on the internal arithmetic processor element PE/ALU. The memory control section 54 monitors the state of the memory bank, and performs switching control of the memory bank, DMA request, and assertion and cancellation of the stall signal STR for stopping the operation of the arithmetic processor element, thereby realizing external memory and internal arithmetic processing Seamless data transfer between device components. In response to this stall signal STR, the arithmetic control section 56 controls the start and stop of the operations of the arithmetic processor elements.

图7A-7C和图8A-8C是描述了本实施例的存储器处理器元件中的两个存储器库的切换操作的示意图。在图7A-7C和图8A-8C中,在存储器处理器元件PE/RAM中示出了两个存储器库BNK0、BNK1和访问结束寄存器END-REG,其中访问结束控制器被存储器控制部件54(见图6)用来控制存储器库的切换。存在两个访问结束寄存器END-REG,其中分别存储用于指示第一和第二存储器库的访问状态的标志,例如,当存储器访问结束并且接收到结束信号时,该标志被设置为结束状态“0”,而当存储器库进入访问使能状态(就绪)时,该标志被设置为就绪状态“1”。通过监视这两个寄存器值,存储器控制部件54(见图6)控制两个存储器库BNK0和BNK1的切换。7A-7C and FIGS. 8A-8C are schematic diagrams describing switching operations of two memory banks in the memory processor element of the present embodiment. In FIGS. 7A-7C and FIGS. 8A-8C, two memory banks BNK0, BNK1 and an access end register END-REG are shown in the memory processor element PE/RAM, wherein the access end controller is controlled by the memory control unit 54( See Figure 6) used to control the switching of memory banks. There are two access end registers END-REG in which flags indicating the access states of the first and second memory banks are respectively stored, for example, when the memory access ends and an end signal is received, the flag is set to the end state " 0", and when the memory bank enters the access enable state (ready), the flag is set to the ready state "1". By monitoring these two register values, the memory control section 54 (see FIG. 6) controls switching of the two memory banks BNK0 and BNK1.

现在参照图6、图7A-7C和图8A-8C描述初始启动后的操作。在启动时,定序器SEQ在复位被清零后输出对应于初始启动的地址,并且用于初始启动的配置数据从配置数据存储器14(图6)输出,群集中的处理器元件PE和PE间开关组20被配置为初始电路配置。通过该初始启动,初始值被设置在访问结束寄存器END-REG中,如图7A所示。在该示例中,第一存储器库BNK0的寄存器处于就绪状态(标志是“0”),而第二存储器库BNK1的寄存器处于访问结束状态(标志是“1”)。通过该初始启动,选择器SEL被配置以使得第一存储器库BNK0连接到外侧接口52,而第二存储器库BNK1连接到内侧接口50。Operation after initial start-up will now be described with reference to Figure 6, Figures 7A-7C and Figures 8A-8C. At startup, the sequencer SEQ outputs the address corresponding to the initial startup after reset is cleared, and the configuration data for the initial startup is output from the configuration data memory 14 (FIG. 6), the processor elements PE and PE in the cluster The inter-switch bank 20 is configured as an initial circuit configuration. By this initial start, an initial value is set in the access end register END-REG, as shown in FIG. 7A. In this example, the register of the first memory bank BNK0 is in the ready state (flag is "0"), and the register of the second memory bank BNK1 is in the access end state (flag is "1"). With this initial activation, the selector SEL is configured such that the first memory bank BNK0 is connected to the outside interface 52 and the second memory bank BNK1 is connected to the inside interface 50 .

在初始启动之后,存储器控制部件54查阅访问结束寄存器,并输出对外部存储器的访问请求DMAR。如上所述,访问请求DMAR经由数据流控制部件40(图5)被发送到直接存储器访问控制部件DMAC,在外部存储器E-MEM和第一存储器库BNK0之间开始了直接数据传输。具体而言,从外部存储器E-MEM读取的数据经由外部总线被直接传输和写入第一存储器库BNK0。如上所述,初始启动时的访问请求DMAR从多个存储器处理器元件输出,因此利用多个直接存储器访问的数据传输被同步执行。After the initial startup, the memory control section 54 refers to the access end register, and outputs an access request DMAR to the external memory. As described above, the access request DMAR is sent to the direct memory access control section DMAC via the data flow control section 40 (FIG. 5), and direct data transfer between the external memory E-MEM and the first memory bank BNK0 is started. Specifically, data read from the external memory E-MEM is directly transferred and written to the first memory bank BNK0 via the external bus. As described above, the access request DMAR at the time of initial start-up is output from a plurality of memory processor elements, so data transfer using a plurality of direct memory accesses is performed synchronously.

然后,如图7B所示,当从外部存储器E-MEM到第一存储器库BNK0的数据传输结束时,从DMA控制部件DMAC发送访问结束信号END1,响应于此,访问结束寄存器END-REG中对应于第一存储器库的位变为访问结束状态(标志“1”)。以此方式,当两个寄存器都变为访问结束状态(标志“1”)时,存储器控制部件54发出状态结束信号CS,使得定序器SEQ输出下一地址Add并使得配置数据存储器14输出新的配置数据CD,从而切换第一存储器库BNK0和第二存储器库BNK1。换言之,第二存储器库BNK1连接到外侧接口52,第一存储器库BNK0连接到内侧接口50。Then, as shown in FIG. 7B, when the data transfer from the external memory E-MEM to the first memory bank BNK0 ends, the DMA control part DMAC sends the access end signal END1, and in response to this, the corresponding access end register END-REG The bit in the first memory bank becomes the access end state (flag "1"). In this way, when both registers become the access end state (flag "1"), the memory control section 54 issues the state end signal CS, causing the sequencer SEQ to output the next address Add and causing the configuration data memory 14 to output the new The configuration data CD for switching the first memory bank BNK0 and the second memory bank BNK1. In other words, the second memory bank BNK1 is connected to the outside interface 52 , and the first memory bank BNK0 is connected to the inside interface 50 .

然后,如图7C所示,当两个存储器库被切换时,存储器控制部件54清零访问结束寄存器END-REG,从而将两个存储器库都设置为就绪状态(标志“0”)。响应于该状态,存储器控制部件54输出访问请求DMAR到外部存储器,基于该访问请求,DMA控制部件DMAC控制外部存储器E-MEM和第二存储器库BNK1之间的数据传输。在此情形下的访问控制DMAR是在存储器处理器元件需要进行访问的定时上发出的,这与初始启动时是不同的,因此数据传输根据需要而执行。同时,存储器控制部件54输出信号ALU-EN,该信号指示了内部运算处理器元件可被执行,响应于此,运算控制部件56输出运算开始信号ALU-ST到内部运算处理器元件PE/ALU,并开始运算处理器元件的运算处理。于是,内部运算处理器元件PE/ALU访问第一存储器库BNK0,读取数据,并对读取的数据执行运算处理。Then, as shown in FIG. 7C, when the two memory banks are switched, the memory control section 54 clears the access end register END-REG, thereby setting both memory banks to the ready state (flag "0"). In response to this state, the memory control section 54 outputs an access request DMAR to the external memory, based on which the DMA control section DMAC controls data transfer between the external memory E-MEM and the second memory bank BNK1. The access control DMAR in this case is issued at the timing at which the memory processor element needs access, which is different from the initial start-up, so data transfer is performed as needed. At the same time, the memory control part 54 outputs a signal ALU-EN indicating that the internal arithmetic processor element can be executed, and in response thereto, the arithmetic control part 56 outputs an operation start signal ALU-ST to the internal arithmetic processor element PE/ALU, And the operation processing of the operation processor element is started. Then, the internal arithmetic processor element PE/ALU accesses the first memory bank BNK0, reads data, and performs arithmetic processing on the read data.

然后,如图8A所示,当第二存储器库BNK1和外部存储器E-MEM之间的数据传输结束时,响应于访问结束信号END1,访问结束寄存器END-REG被设置为访问结束状态(标志“1”)。通常,与外部存储器之间的直接存储器访问具有较宽的数据总线宽度,因此是高速数据传输,并且在与内部运算处理器元件间的数据传输之前结束。Then, as shown in FIG. 8A, when the data transfer between the second memory bank BNK1 and the external memory E-MEM ends, in response to the access end signal END1, the access end register END-REG is set to the access end state (flag " 1"). Typically, direct memory access to and from external memory has a wider data bus width, and therefore is a high-speed data transfer, and ends before data transfer to and from the internal arithmetic processor element.

如图8B所示,来自内部运算处理器元件PE/ALU的访问也结束了,访问结束寄存器END-REG的另一标志也被访问结束信号END2设置为访问结束状态(标志“1”)。响应于此,存储器控制部件54输出状态结束信号CS,并根据从配置数据存储器14输出的配置数据CD,替换第一存储器库BNK0和第二存储器库BNK1与内侧和外侧接口之间的连接。As shown in FIG. 8B, the access from the internal arithmetic processor element PE/ALU is also ended, and another flag of the access end register END-REG is also set to the access end state (flag "1") by the access end signal END2. In response thereto, memory control section 54 outputs state end signal CS, and replaces connections between first memory bank BNK0 and second memory bank BNK1 and the inside and outside interfaces based on configuration data CD output from configuration data memory 14 .

如图8C所示,存储器控制部件54再次输出直接存储器访问请求DMAR,开始第一存储器库BNK0和外部存储器E-MEM之间的数据传输,运算控制部件56输出运算开始信号ALU-ST并开始从内部运算处理器元件PE/ALU到第2存储器库BNK1的访问。As shown in FIG. 8C , the memory control unit 54 outputs a direct memory access request DMAR again to start data transfer between the first memory bank BNK0 and the external memory E-MEM, and the operation control unit 56 outputs an operation start signal ALU-ST and starts to transfer data from the first memory bank BNK0 to the external memory E-MEM. Access from the internal arithmetic processor element PE/ALU to the 2nd memory bank BNK1.

如上所述,通过交替切换第一和第二存储器库,存储器控制部件54实现从外部存储器E-MEM到内部运算处理器元件的无缝数据传输。具体而言,与外部存储器之间的直接存储器访问比内部运算处理器元件的访问快,因此运算处理器元件可无缝地读取和处理数据。As described above, by alternately switching the first and second memory banks, the memory control section 54 realizes seamless data transfer from the external memory E-MEM to the internal arithmetic processor element. Specifically, direct memory access to and from external memory is faster than access to internal OP elements, so the OP elements can read and process data seamlessly.

图9A-9C是描述了根据本实施例的存储器处理器元件中的两个存储器库的切换操作的示意图。这里将描述在无缝数据传输出现问题时的控制。由于与外部存储器之间的直接数据传输以高速进行,因此通常一个存储器库在另一个存储器库结束与内部运算PE间的数据传输之前就结束了与外部存储器间的数据传输。当与内部运算PE间的数据传输完成时,执行存储器库切换控制,于是可实现在外部存储器和内部运算PE之间的无缝数据传输。但是由于某些原因,有些情形下与内部运算PE之间的数据传输先完成。9A-9C are diagrams describing switching operations of two memory banks in the memory processor element according to the present embodiment. Controls in case of problems with seamless data transfer will be described here. Since the direct data transfer with the external memory is performed at high speed, usually one memory bank finishes the data transfer with the external memory before the other memory bank finishes the data transfer with the internal computing PE. When the data transfer with the internal computing PE is completed, memory bank switching control is performed, so that seamless data transfer between the external memory and the internal computing PE can be realized. However, due to some reasons, in some cases, the data transmission with the internal computing PE is completed first.

如图9A所示,如果从第一存储器库BNK0到内部运算PE的数据传输先结束,则访问结束寄存器END-REG被结束信号END2设置为访问结束状态(标志“1”)。响应于此,存储器控制部件54向运算控制部件56断言一个停顿信号STR,于是运算PE阵列暂时停止其流水线处理。换言之,当不能从存储器PE读取数据时,运算PE阵列的流水线处理无法进行,运算处理开始出现问题。As shown in FIG. 9A, if the data transfer from the first memory bank BNK0 to the internal operation PE ends first, the access end register END-REG is set to the access end state (flag "1") by the end signal END2. In response to this, the memory control section 54 asserts a stall signal STR to the operation control section 56, whereupon the operation PE array temporarily stops its pipeline processing. In other words, when the data cannot be read from the memory PE, the pipeline processing of the operation PE array cannot be performed, and the operation processing starts to have a problem.

如图9B所示,当第二存储器库BNK1的数据传输完成时,访问结束寄存器END-REG被结束信号END1设置为访问结束状态。于是,存储器控制部件54输出状态结束信号CS,并根据配置数据CD切换存储器库。然后,如图9C所示,存储器控制部件54输出访问请求DMAR,使得第一存储器库BNK0开始与外部存储器之间的数据传输,取消停顿信号STR,并重新开始内部运算PE阵列的操作,于是,第二存储器库BNK1开始与内部运算PE之间的数据传输。As shown in FIG. 9B, when the data transfer of the second memory bank BNK1 is completed, the access end register END-REG is set to the access end state by the end signal END1. Then, the memory control section 54 outputs the state end signal CS, and switches the memory banks according to the configuration data CD. Then, as shown in FIG. 9C, the memory control unit 54 outputs an access request DMAR, so that the first memory bank BNK0 starts data transmission with the external memory, cancels the pause signal STR, and restarts the operation of the internal computing PE array, so, The second memory bank BNK1 starts data transmission with the internal computing PE.

以此方式,专用运算电路被配置,并且数据运算处理被流水线式处理,于是在存储器控制部件54监视两个存储器库的访问状态并且数据的无缝传输被禁止时,存储器控制部件54断言一个停顿信号STR,以停止对内部运算PE的流水线处理。这样,可以避免流水线处理可能出现的问题。当无缝传输被使能时,存储器控制部件54取消停顿信号STR,并重新开始流水线处理。In this way, the dedicated arithmetic circuit is configured, and the data arithmetic processing is pipelined, so when the memory control section 54 monitors the access states of the two memory banks and the seamless transfer of data is prohibited, the memory control section 54 asserts a stall Signal STR to stop the pipeline processing of the internal operation PE. In this way, possible problems with pipelining can be avoided. When seamless transfer is enabled, the memory control section 54 cancels the stall signal STR, and resumes pipeline processing.

图10A-10C和图11A-11C是描述了存储器处理器元件中的两个存储器库的切换操作的示意图。这是在执行经由存储器PE从内部运算PE到外部存储器E-MEM的数据传输时的示例。10A-10C and 11A-11C are schematic diagrams describing switching operations of two memory banks in a memory processor element. This is an example when performing data transfer from the internal operation PE to the external memory E-MEM via the memory PE.

在图10A中,运算PE向第一存储器库BNK0写数据。在图10B中,当数据写完成时,两个访问结束寄存器END-REG都变为访问结束状态(标志“1”)。响应于此,存储器控制部件54输出状态结束信号CS,并基于配置数据CD来切换两个存储器库。如图10C所示,第一存储器库BNK0通过访问请求DMAC开始与外部存储器之间的直接数据传输,通过到运算PE的运算开始信号ALU-ST开始从运算PE到第二存储器库BNK1的数据写。In FIG. 10A, the operation PE writes data to the first memory bank BNK0. In FIG. 10B, when data writing is completed, both access end registers END-REG become access end states (flag "1"). In response to this, the memory control section 54 outputs a state end signal CS, and switches the two memory banks based on the configuration data CD. As shown in FIG. 10C, the first memory bank BNK0 starts the direct data transfer with the external memory through the access request DMAC, and starts the data writing from the operation PE to the second memory bank BNK1 through the operation start signal ALU-ST to the operation PE. .

然后,如图11A所示,第一存储器库BNK0的数据传输首先完成,从运算PE的数据写如图11B所示结束。于是,存储器控制部件54切换两个存储器库,交换后的存储器库的数据传输如图11C所示分别开始。Then, as shown in FIG. 11A , the data transfer of the first memory bank BNK0 is completed first, and the data writing of the slave operation PE is completed as shown in FIG. 11B . Then, the memory control section 54 switches the two memory banks, and the data transfer of the exchanged memory banks respectively starts as shown in FIG. 11C.

如上所述,从运算PE到外部存储器的数据传输也经由存储器PE被无缝执行。如果无缝数据传输被中途禁止,则停顿信号STR被取消,运算PE阵列停止流水线处理,并且在数据传输被使能时重新开始流水线处理。As described above, data transfer from the operation PE to the external memory is also performed seamlessly via the memory PE. If the seamless data transmission is disabled midway, the pause signal STR is canceled, the arithmetic PE array stops the pipeline processing, and restarts the pipeline processing when the data transmission is enabled.

图12是描述了根据本实施例的存储器处理器元件的控制部件的框图。图13是其控制部件的状态转换图。在图12的示例中,同一群集中的存储器单元60具有多个存储器处理器元件RAM-PE0~PEn,运算处理器元件的阵列PE/ALU阵列被配置为与存储器处理器元件RAM-PE0~PEn中的每一个相对应。每个存储器PE包括作为存储器控制部件54的库切换控制部件541和DMA传输执行判断部件542,还具有作为运算控制部件56的ALU运算执行判断部件561。多个存储器PE共享作为运算控制部件56的ALU运算控制部件562,DMA传输控制部件543被提供为存储器控制部件54。存储器PE中的第一存储器库BNK0和第二存储器库BNK1被配置为经由外部总线交替地与访问控制部件DMAC进行数据传输,以及经由群集中的PE间开关组PE-SW交替地与运算处理器元件阵列PE/ALU阵列进行数据传输。FIG. 12 is a block diagram describing a control section of the memory processor element according to the present embodiment. Fig. 13 is a state transition diagram of its control components. In the example of FIG. 12, the memory unit 60 in the same cluster has a plurality of memory processor elements RAM-PE0˜PEn, and the array PE/ALU array of the arithmetic processor elements is configured to be connected with the memory processor elements RAM-PE0˜PEn. corresponds to each of the . Each memory PE includes a bank switch control section 541 and a DMA transfer execution judgment section 542 as the memory control section 54 , and also has an ALU operation execution judgment section 561 as the operation control section 56 . A plurality of memories PE share the ALU operation control section 562 as the operation control section 56 , and the DMA transfer control section 543 is provided as the memory control section 54 . The first memory bank BNK0 and the second memory bank BNK1 in the memory PE are configured to alternately perform data transfer with the access control part DMAC via the external bus, and alternately communicate with the arithmetic processor via the inter-PE switch group PE-SW in the cluster. The element array PE/ALU array performs data transmission.

下面将参照图13中的状态转换图描述控制流。如上所述,第一存储器处理器元件RAM-PE启动,并基于配置数据CD被配置为所需电路配置(C10)。通过所述启动,访问结束寄存器END-REG被设置为初始值标志,存储器库通过该标志状态变为初始状态(C12)。The control flow will be described below with reference to the state transition diagram in FIG. 13 . As mentioned above, the first memory processor element RAM-PE is enabled and configured to the desired circuit configuration based on the configuration data CD (C10). By the activation, the access end register END-REG is set to an initial value flag by which the memory bank becomes the initial state (C12).

在存储器处理器元件RAM-PE启动之后的操作期间,库切换控制部件541根据访问结束寄存器END-REG的状态(都是标志“1”)来控制存储器库的切换(C12),从而切换存储器库(C14)。当存储器库被切换时,运算PE的电路配置可被相应地转换(C12、C14)。During the operation after the memory processor element RAM-PE starts, the bank switching control part 541 controls the switching (C12) of the memory bank according to the state of the access end register END-REG (both flags "1"), thereby switching the memory bank (C14). When the memory bank is switched, the circuit configuration of the computing PE can be converted accordingly (C12, C14).

当存储器库被切换时,DMA传输执行判断部件542判断到外部存储器的数据传输是否可能,如果数据传输可被执行,则DMA传输执行判断部件542向安装在存储器PE外部的DMA传输控制部件543输出DMA传输使能信号DMA-EN(C16)。是否可以进行数据传输取决于指示存储器库状态的访问结束寄存器END-REG的状态。相应的DMA传输控制部件543经由数据流控制部件40(未示出,见图5)输出访问请求到访问控制部件DMAC(C18),数据传输被执行(C20)。当与外部存储器的数据传输结束时,DMA传输控制部件543接收数据传输结束信号END1,数据传输结束信号END10被发送到库切换控制部件541。然后,根据访问结束寄存器END-REG的状态执行上述库切换控制(C12)。When the memory bank is switched, the DMA transfer execution judgment part 542 judges whether the data transfer to the external memory is possible, and if the data transfer can be executed, the DMA transfer execution judgment part 542 outputs to the DMA transfer control part 543 installed outside the memory PE DMA transfer enable signal DMA-EN (C16). Whether or not data transfer is possible depends on the status of the access end register END-REG indicating the status of the memory bank. The corresponding DMA transfer control part 543 outputs an access request to the access control part DMAC via the data flow control part 40 (not shown, see FIG. 5) (C18), and data transfer is performed (C20). When the data transfer with the external memory ends, the DMA transfer control section 543 receives the data transfer end signal END1 , and the data transfer end signal END10 is sent to the bank switching control section 541 . Then, the above-mentioned bank switching control is executed according to the state of the access end register END-REG (C12).

另一方面,当存储器库被切换时,ALU运算执行判断部件561基于访问结束寄存器END-REG来监视存储器库的状态,并判断从运算PE的访问是否可能,即,运算PE是否可执行运算处理(C22)。如果执行是可能的,则ALU运算执行判断部件561输出运算执行使能信号ALU-EN。On the other hand, when the memory bank is switched, the ALU operation execution judgment part 561 monitors the state of the memory bank based on the access end register END-REG, and judges whether access from the operation PE is possible, that is, whether the operation PE can perform operation processing (C22). If execution is possible, the ALU operation execution judging section 561 outputs an operation execution enable signal ALU-EN.

仅当从所有存储器处理器元件RAM-PE0~PEn都接收到运算执行使能信号ALU-EN时,ALU运算控制部件562输出运算开始信号ALU-ST到群集中的所有运算PE阵列(C24),并使得所有运算PE阵列同步执行运算处理(C26)。换言之,群集中的多个运算PE阵列必须在执行与多个存储器PE的数据传输的同时同步执行流水线处理,因此一个ALU运算控制部件562被安装为多个存储器PE的公用部件,并且仅当从所有存储器PE接收到运算执行使能信号ALU-EN时,ALU运算控制部件562才向多个运算PE阵列输出运算开始信号ALU-ST。ALU运算执行判断部件561监视存储器库的状态,如果数据传输不能无缝地进行,则ALU运算执行判断部件561断言一个停顿信号STR,并停止运算PE阵列的流水线处理。停顿信号STR如上所述。Only when the operation execution enable signal ALU-EN is received from all memory processor elements RAM-PE0~PEn, the ALU operation control part 562 outputs the operation start signal ALU-ST to all operation PE arrays in the cluster (C24), And make all computing PE arrays execute computing processing synchronously (C26). In other words, a plurality of operation PE arrays in the cluster must perform pipeline processing synchronously while performing data transfer with a plurality of memory PEs, so one ALU operation control part 562 is installed as a common part of a plurality of memory PEs, and only when from When all memory PEs receive the operation execution enable signal ALU-EN, the ALU operation control unit 562 outputs the operation start signal ALU-ST to multiple operation PE arrays. The ALU operation execution judging unit 561 monitors the state of the memory bank, and if the data transfer cannot be performed seamlessly, the ALU operation execution judging unit 561 asserts a pause signal STR, and stops the pipeline processing of the arithmetic PE array. The pause signal STR is as described above.

当运算处理完成时,到运算PE侧的存储器库的访问结束,于是从运算PE接收结束信号END2,ALU运算执行判断部件561取消运算执行使能信号ALU-EN。通过该结束信号END2,访问结束寄存器END-REG的标志状态被改变,存储器库被切换或者运算PE的配置改变被相应地控制和执行(C12、C14)。When the operation processing is completed, the access to the memory bank on the operation PE side ends, and upon receiving the end signal END2 from the operation PE, the ALU operation execution judging section 561 cancels the operation execution enable signal ALU-EN. Through the end signal END2, the flag state of the access end register END-REG is changed, the memory bank is switched or the configuration change of the operation PE is controlled and executed accordingly (C12, C14).

在图13中,虚线那的状态转换示出了存储器PE的状态转换,其左侧示出了DMA传输控制部件543和直接存储器访问控制部件DMAC的状态,其右侧示出了ALU运算控制部件562和运算PE阵列的状态。In FIG. 13, the state transition of the dotted line shows the state transition of the memory PE, the left side shows the state of the DMA transfer control unit 543 and the direct memory access control unit DMAC, and the right side shows the ALU operation control unit The 562 sums the state of the PE array.

在图12和图13中,DMA传输控制部件543基于DMA传输执行判断部件542输出的DMA传输使能信号DMA-EN输出DMA请求,但是DMA传输控制部件543可检查直接存储器访问控制部件DMAC接受的信道状态,从而判断DMA传输是否可被执行,即DMA传输执行定时是否合适,如果合适的话则输出DMA请求。这样,当直接存储器访问控制部件DMAC的信道数量超过预定数量而且定时不适于发送DMA请求时,可停止对DMA请求的发送,直到信道数量变为预定数量或少于预定数量,并且DMA传输定时可被延迟。DMA传输使能信号DMA-EN是根据访问结束寄存器END-REG的状态生成的,因此对延迟DMA传输定时的这一控制是很重要的。In FIGS. 12 and 13, the DMA transfer control section 543 outputs a DMA request based on the DMA transfer enable signal DMA-EN output by the DMA transfer execution judgment section 542, but the DMA transfer control section 543 may check the Channel status, so as to judge whether the DMA transfer can be performed, that is, whether the execution timing of the DMA transfer is appropriate, and if appropriate, output a DMA request. In this way, when the number of channels of the direct memory access control part DMAC exceeds a predetermined number and the timing is not suitable for sending a DMA request, sending of a DMA request can be stopped until the number of channels becomes a predetermined number or less, and the DMA transfer timing can be It is delayed. The DMA transfer enable signal DMA-EN is generated according to the state of the access end register END-REG, so this control of delayed DMA transfer timing is important.

在图13中,当运算处理器元件阵列的操作结束时(C26),新的配置数据从定序器输出,运算PE的配置数据被改变(C12)。在必要时,配置数据被切换。In FIG. 13, when the operation of the arithmetic processor element array ends (C26), new configuration data is output from the sequencer, and the configuration data of the arithmetic PE is changed (C12). Configuration data is switched when necessary.

图14A-14B是描述了访问结束寄存器的标志改变控制的示意图。图14A示出了当存储器库BNK0/1连接到内侧(运算PE阵列侧)时的标志改变控制。用于访问的地址Add从运算PE阵列侧被提供给存储器库BNK,相应的访问被执行。该访问地址Add也被提供给存储器控制部件54中的比较器70。当电路被基于配置数据配置时将被访问的结束地址E-Add已被预先设置在比较器70中。每次地址有效信号Valid(该信号指示附接到访问地址的地址是否有效)变为有效,比较器70就比较访问地址Add和结束地址E-Add,并且如果它们匹配则将访问结束寄存器END-REG的标志变为“1”。14A-14B are diagrams describing flag change control of the access end register. FIG. 14A shows flag change control when the memory bank BNK0/1 is connected to the inside (operation PE array side). The address Add for access is supplied to the memory bank BNK from the arithmetic PE array side, and the corresponding access is performed. This access address Add is also supplied to the comparator 70 in the memory control section 54 . The end address E-Add to be accessed when the circuit is configured based on the configuration data has been set in the comparator 70 in advance. Every time the address valid signal Valid (the signal indicating whether the address attached to the access address is valid) becomes valid, the comparator 70 compares the access address Add and the end address E-Add, and if they match, the access end register END-Add The flag of REG becomes "1".

作为另一控制方法,响应于来自运算PE阵列的结束信号END2,访问结束寄存器END-REG的标志可被变为结束状态“1”。在任一情形下,当内侧和外侧存储器库被切换时,访问结束寄存器END-REG的标志都被设置为就绪状态“0”。As another control method, the flag of the access end register END-REG may be changed to end state "1" in response to the end signal END2 from the operation PE array. In either case, when the inside and outside memory banks are switched, the flag of the access end register END-REG is set to the ready state "0".

图14B示出了当存储器库0/1连接到外侧(外部存储器E-MEM侧)时的标志改变控制。在此情形下,访问地址Add被从访问控制部件DMAC提供。响应于来自访问控制部件DMAC的结束信号END1,存储器控制部件54将访问结束寄存器END-REG的标志变为结束状态“1”,当存储器库的内侧和外侧被切换时,存储器控制部件54响应于切换结束信号END-SW将访问结束寄存器END-REG的标志设置为就绪状态“0”。FIG. 14B shows flag change control when the memory bank 0/1 is connected to the outside (external memory E-MEM side). In this case, the access address Add is provided from the access control section DMAC. In response to the end signal END1 from the access control part DMAC, the memory control part 54 changes the flag of the access end register END-REG into the end state "1", and when the inside and outside of the memory bank were switched, the memory control part 54 responded to The switching end signal END-SW sets the flag of the access end register END-REG to the ready state "0".

此外,访问结束寄存器END-REG的结束状态通过重置被清零并且被设置为就绪状态。Also, the end state of the access end register END-REG is cleared by reset and set to the ready state.

图15A-15B和16是描述了存储器PE中的外侧接口的示意图。外侧接口52连接到外部总线E-BUS1,并基于配置数据CD被动态配置为不同的输入/输出总线接口结构。通常,用于直接存储器访问的外部总线E-BUS1具有较宽的总线宽度。例如,在外部存储器E-MEM是32位DDR-SDRAM时,数据在一个时钟周期内被输出两次,因此外部总线E-BUS1的总线宽度是64位。在此情形下,外侧接口52的电路被配置为使得64位数据并行地输入到存储器库BNK中的四个16位RAM,或并行地从存储器库BNK中的四个16位RAM输出。15A-15B and 16 are schematic diagrams describing external interfaces in the memory PE. The outside interface 52 is connected to the external bus E-BUS1, and is dynamically configured into different input/output bus interface structures based on the configuration data CD. Generally, the external bus E-BUS1 for direct memory access has a wider bus width. For example, when the external memory E-MEM is a 32-bit DDR-SDRAM, data is output twice in one clock cycle, so the bus width of the external bus E-BUS1 is 64 bits. In this case, the circuit of the outside interface 52 is configured such that 64-bit data is input in parallel to or output from the four 16-bit RAMs in the memory bank BNK in parallel.

图15A示出了当外部总线E-BUS1的总线宽度是16位时的外侧接口。如上所述,64位数据被并行地输入到四个16位RAM,或并行地从四个16位RAM输出。FIG. 15A shows the outside interface when the bus width of the external bus E-BUS1 is 16 bits. As described above, 64-bit data is input to or output from four 16-bit RAMs in parallel.

图15B示出了当总线宽度为32位时的情形,接口被配置为使得32位数据被并行地输入两组RAM,或并行地从这两组RAM输出,其中每组由两个16位RAM构成。向每组的两个RAM输入16位数据和从每组的两个RAM输出16位数据的接口是串行的。Figure 15B shows the situation when the bus width is 32 bits, the interface is configured such that 32-bit data is input in parallel to two sets of RAMs, or output from these two sets of RAMs in parallel, where each set consists of two 16-bit RAMs constitute. The interface for inputting 16-bit data to and outputting 16-bit data from the two RAMs of each group is serial.

图16示出了当总线带宽是16位并且接口被配置为使得16位数据被串行输入四个16位RAM或被串行输出四个16位RAM。图16中接口52的配置与内侧接口的配置相同。换言之,内侧接口被配置为图16所示的配置,因为运算PE阵列侧的内部总线宽度较窄,即16位。因此,内侧接口50被配置为使得16位数据被串行输入四个16位RAM或被串行输出四个16位RAM。FIG. 16 shows when the bus bandwidth is 16 bits and the interface is configured such that 16 bits of data are serially input to or serially output from four 16 bit RAMs. The configuration of the interface 52 in FIG. 16 is the same as that of the inner interface. In other words, the inside interface is configured as shown in FIG. 16 because the internal bus width on the arithmetic PE array side is narrow, ie, 16 bits. Therefore, the inside interface 50 is configured such that 16-bit data is serially input to or serially output from four 16-bit RAMs.

以此方式,对存储器PE中的接口50和52进行配置,以和基于配置数据CD而连接的总线的配置相匹配。In this way, the interfaces 50 and 52 in the memory PE are configured to match the configuration of the bus connected based on the configuration data CD.

如上所述,根据本实施例,包括多个运算PE和存储器PE的多组群集被布置在可通过动态改变电路配置而被配置的集成电路器件中,群集通过连接状态被动态改变的开关组互连,独立于该群集间开关组,群集中的存储器PE与外部存储器连接。存储器PE可执行与外部存储器的DMA传输。存储器PE例如还是双缓冲器配置,从而可在外部存储器和运算PE之间进行无缝数据传输,如果数据传输出现问题,则运算PE阵列的流水线操作暂时停止。As described above, according to the present embodiment, a plurality of groups of clusters including a plurality of operation PEs and memory PEs are arranged in an integrated circuit device configurable by dynamically changing the circuit configuration, and the clusters are interconnected through switch groups whose connection states are dynamically changed. Connected, independent of the inter-cluster switch group, the storage PEs in the cluster are connected to external storage. Memory PE can perform DMA transfer with external memory. The memory PE is also configured with double buffers, so that seamless data transmission can be performed between the external memory and the computing PE. If there is a problem in data transmission, the pipeline operation of the computing PE array is temporarily stopped.

本发明基于2005年8月2日提交的在先日本专利申请No.2005-224208并要求享受其优先权,该在先申请的全部内容通过引用而包含于此。This application is based on and claims priority from prior Japanese Patent Application No. 2005-224208 filed on August 2, 2005, the entire contents of which are hereby incorporated by reference.

Claims (16)

1. reconfigurable integrated circuit device, this device is dynamically configured based on configuration data and is any compute mode, and this device comprises:
A plurality of trooping, described trooping also comprises and carries out the memory processor element with storer of data transmission between a plurality of arithmetic processor elements that have computing unit respectively and the external memory storage and be used for being connected switches set between the processor elements of described arithmetic processor element and described memory processor element under free position;
A switches set of trooping is used for the data routing between described the trooping of configuration under free position; And
External memory bus is used to carry out the data transmission between described memory processor element and the described external memory storage, wherein
Switches set and a described switches set of trooping are dynamically changed based on described configuration data between described arithmetic processor element, described memory processor element, described processor elements, and described device also comprises:
Direct memory access control parts, it visits the data transmission of carrying out between described memory processor element and the described external memory storage in response to the request of access of coming from described a plurality of memory processor elements of trooping by direct memory.
2. reconfigurable integrated circuit device as claimed in claim 1, wherein said trooping also comprises the configuration data memory that is used to store described configuration data, and sequencer, this sequencer is used to dispose the configuration data of next compute mode in response to from described arithmetic processor element and memory processor element and the end signal that comes from described configuration data memory output.
3. reconfigurable integrated circuit device as claimed in claim 1, also comprise the data-flow-control member made, this data-flow-control member made is installed to be the global facility of described a plurality of memory processor elements, be used to accept from described a plurality of memory processor elements and the direct memory request of access of coming, and to the synchronous direct memory request of access of described direct memory access control parts indication that is used for described a plurality of memory processor elements.
4. reconfigurable integrated circuit device as claimed in claim 1, also comprise the data-flow-control member made, this data-flow-control member made is installed to be the global facility of described a plurality of memory processor elements, be used to accept from described a plurality of memory processor elements and the direct memory request of access of coming, and to the synchronous direct memory request of access of described direct memory access control parts indication that is used for described a plurality of memory processor elements, wherein
When the direct memory request of access is when the single memory processor elements is accepted, described data flow con-trol unit response is indicated described direct memory request of access in described acceptance operation to described direct memory access control parts.
5. reconfigurable integrated circuit device as claimed in claim 1, wherein
Described memory processor element also comprise and be connected between the internal bus of switches set between described processor elements interior side interface and and described external memory bus between outer side interface, wherein
When described memory processor element was being visited described external memory storage via side interface outside described by the direct memory visit, described arithmetic processor element was via the described memory processor element of described inboard interface accessing.
6. reconfigurable integrated circuit device as claimed in claim 5, wherein
Described memory processor element also comprises first and second memory banks, wherein
Described first and second memory banks alternately are connected to described inboard and outer side interface based on described configuration data.
7. reconfigurable integrated circuit device as claimed in claim 6, wherein
After the data transmission of described memory processor element between the described external memory storage and described first or second storehouse finished, allow described arithmetic processor element and described first or the second memory storehouse between data transmission, and
If described external memory storage and described first or the second memory storehouse between data transmission do not finish, then described memory processor element is asserted a halted signals, to indicate shut-down operations to described a plurality of arithmetic processor elements, and when described external memory storage and described first or the second memory storehouse between data transmission when finishing, cancel described halted signals.
8. reconfigurable integrated circuit device as claimed in claim 3, wherein said memory processor element monitors the mode of operation of described direct memory access control parts, and based on described mode of operation described request of access is offered described data-flow-control member made.
9. reconfigurable integrated circuit device as claimed in claim 8, wherein said memory processor element is controlled the timing of described request of access changeably based on described mode of operation.
10. reconfigurable integrated circuit device as claimed in claim 1, data transmission when wherein said memory processor element carries out data transmission between by direct memory visit and described external memory storage between acceptance and the described arithmetic processor element, the data transmission by direct memory visit do not catch up with and described arithmetic processor element between data transmission the time, assert a halted signals stopping the computing of described a plurality of arithmetic processor elements, and in the time can catching up with, cancel described halted signals.
11. reconfigurable integrated circuit device as claimed in claim 5, the outer side interface of wherein said memory processor element is built as Interface status corresponding to described a plurality of data-bus widths based on described configuration data.
12. reconfigurable integrated circuit device as claimed in claim 1, wherein
Described memory processor element also comprises first and second memory banks, and
Described memory processor element is set to enable the state that when starting outside bus side conducted interviews based on configuration data with one in described first and second memory banks, and exports described request of access.
13. reconfigurable integrated circuit device as claimed in claim 12, when wherein in described first and second memory banks finishes the data transmission of visiting by direct memory, described memory processor element asserts that to described arithmetic processor element computing carries out enable signal, carries out computing to impel described arithmetic processor element.
14. reconfigurable integrated circuit device as claimed in claim 13, wherein when described first and second memory banks all enter the data transmission illegal state, described memory processor element is asserted a halted signals, to ask the shut-down operation of described arithmetic processor element.
15. reconfigurable integrated circuit device as claimed in claim 13, wherein said trooping also comprises a plurality of memory processor elements and a public computing execution control assembly of described memory processor element, this unit response is carried out asserting of enable signal in the computing that comes from described a plurality of memory processor elements, carries out to the computing that described a plurality of arithmetic processor element requests are synchronous.
Be predetermined compute mode 16. a reconfigurable integrated circuit device, this device are dynamically configured based on configuration data, this device comprises:
A plurality of trooping, described trooping comprises and carries out the memory processor element with storer of data transmission between arithmetic processor element with computing unit and the external memory storage and be used for being connected switches set between the processor elements of described arithmetic processor element and described memory processor element under free position;
A switches set of trooping is used for the data routing between described the trooping of configuration under free position; And
External memory bus is used to carry out the data transmission between described memory processor element and the described external memory storage, wherein
Switches set and a described switches set of trooping are dynamically changed based on described configuration data between described arithmetic processor element, described memory processor element, described processor elements, and described device also comprises:
Direct memory access control parts, it visits the data transmission of carrying out between described memory processor element and the described external memory storage by direct memory, wherein in response to the request of access of coming from described a plurality of memory processor elements of trooping
Described memory processor element comprises first and second memory banks, wherein when carrying out data transmission by the direct memory visit with described external memory storage for one in described first and second memory banks, another in described first and second memory banks and described arithmetic processor element carry out data transmission.
CNB2006100083495A 2005-08-02 2006-02-17 reconfigurable integrated circuit device Expired - Fee Related CN100414535C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2005224208A JP4536618B2 (en) 2005-08-02 2005-08-02 Reconfigurable integrated circuit device
JP2005224208 2005-08-02

Publications (2)

Publication Number Publication Date
CN1908927A true CN1908927A (en) 2007-02-07
CN100414535C CN100414535C (en) 2008-08-27

Family

ID=37700038

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100083495A Expired - Fee Related CN100414535C (en) 2005-08-02 2006-02-17 reconfigurable integrated circuit device

Country Status (3)

Country Link
US (1) US20070033369A1 (en)
JP (1) JP4536618B2 (en)
CN (1) CN100414535C (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620588B (en) * 2008-07-03 2011-01-19 中国人民解放军信息工程大学 A Connection and Management Method of Reconfigurable Components in High Performance Computer
CN101727434B (en) * 2008-10-20 2012-06-13 北京大学深圳研究生院 Integrated circuit structure special for specific application algorithm
WO2017177928A1 (en) * 2016-04-12 2017-10-19 Huawei Technologies Co., Ltd. Scalable autonomic message-transport with synchronization
US10185606B2 (en) 2016-04-12 2019-01-22 Futurewei Technologies, Inc. Scalable autonomic message-transport with synchronization
US10289598B2 (en) 2016-04-12 2019-05-14 Futurewei Technologies, Inc. Non-blocking network

Families Citing this family (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4201816B2 (en) * 2004-07-30 2008-12-24 富士通株式会社 Reconfigurable circuit and control method of reconfigurable circuit
US7861060B1 (en) * 2005-12-15 2010-12-28 Nvidia Corporation Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior
JP4653697B2 (en) * 2006-05-29 2011-03-16 株式会社日立製作所 Power management method
US7680988B1 (en) * 2006-10-30 2010-03-16 Nvidia Corporation Single interconnect providing read and write access to a memory shared by concurrent threads
US8108625B1 (en) 2006-10-30 2012-01-31 Nvidia Corporation Shared memory with parallel access and access conflict resolution mechanism
US8176265B2 (en) 2006-10-30 2012-05-08 Nvidia Corporation Shared single-access memory with management of multiple parallel requests
US7962702B1 (en) * 2007-07-09 2011-06-14 Rockwell Collins, Inc. Multiple independent levels of security (MILS) certifiable RAM paging system
JP5260068B2 (en) * 2008-01-31 2013-08-14 古野電気株式会社 Detection device and detection method
US8103853B2 (en) * 2008-03-05 2012-01-24 The Boeing Company Intelligent fabric system on a chip
JP5431003B2 (en) * 2009-04-03 2014-03-05 スパンション エルエルシー Reconfigurable circuit and reconfigurable circuit system
US9361960B2 (en) * 2009-09-16 2016-06-07 Rambus Inc. Configurable memory banks of a memory device
JP5711889B2 (en) * 2010-01-27 2015-05-07 スパンション エルエルシー Reconfigurable circuit and semiconductor integrated circuit
KR101076869B1 (en) * 2010-03-16 2011-10-25 광운대학교 산학협력단 Memory centric communication apparatus in coarse grained reconfigurable array
JP5678782B2 (en) * 2011-04-07 2015-03-04 富士通セミコンダクター株式会社 Reconfigurable integrated circuit device
US9130596B2 (en) * 2011-06-29 2015-09-08 Seagate Technology Llc Multiuse data channel
US10157060B2 (en) 2011-12-29 2018-12-18 Intel Corporation Method, device and system for control signaling in a data path module of a data stream processing engine
JP5927012B2 (en) * 2012-04-11 2016-05-25 太陽誘電株式会社 Reconfigurable semiconductor device
US10331583B2 (en) 2013-09-26 2019-06-25 Intel Corporation Executing distributed memory operations using processing elements connected by distributed channels
US10078606B2 (en) * 2015-11-30 2018-09-18 Knuedge, Inc. DMA engine for transferring data in a network-on-a-chip processor
US10203911B2 (en) * 2016-05-18 2019-02-12 Friday Harbor Llc Content addressable memory (CAM) implemented tuple spaces
CN113660439A (en) * 2016-12-27 2021-11-16 株式会社半导体能源研究所 Imaging device and electronic apparatus
US10474375B2 (en) 2016-12-30 2019-11-12 Intel Corporation Runtime address disambiguation in acceleration hardware
US10416999B2 (en) 2016-12-30 2019-09-17 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10572376B2 (en) 2016-12-30 2020-02-25 Intel Corporation Memory ordering in acceleration hardware
US10558575B2 (en) * 2016-12-30 2020-02-11 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10515046B2 (en) 2017-07-01 2019-12-24 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
US10387319B2 (en) 2017-07-01 2019-08-20 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with memory system performance, power reduction, and atomics support features
US10445234B2 (en) 2017-07-01 2019-10-15 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with transactional and replay features
US10515049B1 (en) 2017-07-01 2019-12-24 Intel Corporation Memory circuits and methods for distributed memory hazard detection and error recovery
US10445451B2 (en) 2017-07-01 2019-10-15 Intel Corporation Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features
US10467183B2 (en) 2017-07-01 2019-11-05 Intel Corporation Processors and methods for pipelined runtime services in a spatial array
US10469397B2 (en) 2017-07-01 2019-11-05 Intel Corporation Processors and methods with configurable network-based dataflow operator circuits
US10496574B2 (en) 2017-09-28 2019-12-03 Intel Corporation Processors, methods, and systems for a memory fence in a configurable spatial accelerator
US11086816B2 (en) 2017-09-28 2021-08-10 Intel Corporation Processors, methods, and systems for debugging a configurable spatial accelerator
US10445098B2 (en) 2017-09-30 2019-10-15 Intel Corporation Processors and methods for privileged configuration in a spatial array
US10380063B2 (en) 2017-09-30 2019-08-13 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator
US10565134B2 (en) 2017-12-30 2020-02-18 Intel Corporation Apparatus, methods, and systems for multicast in a configurable spatial accelerator
US10445250B2 (en) 2017-12-30 2019-10-15 Intel Corporation Apparatus, methods, and systems with a configurable spatial accelerator
US10564980B2 (en) 2018-04-03 2020-02-18 Intel Corporation Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator
US11307873B2 (en) 2018-04-03 2022-04-19 Intel Corporation Apparatus, methods, and systems for unstructured data flow in a configurable spatial accelerator with predicate propagation and merging
US10853073B2 (en) 2018-06-30 2020-12-01 Intel Corporation Apparatuses, methods, and systems for conditional operations in a configurable spatial accelerator
US10891240B2 (en) * 2018-06-30 2021-01-12 Intel Corporation Apparatus, methods, and systems for low latency communication in a configurable spatial accelerator
US11200186B2 (en) 2018-06-30 2021-12-14 Intel Corporation Apparatuses, methods, and systems for operations in a configurable spatial accelerator
US10459866B1 (en) 2018-06-30 2019-10-29 Intel Corporation Apparatuses, methods, and systems for integrated control and data processing in a configurable spatial accelerator
US10678724B1 (en) 2018-12-29 2020-06-09 Intel Corporation Apparatuses, methods, and systems for in-network storage in a configurable spatial accelerator
EP3938921A4 (en) * 2019-03-11 2022-12-14 Untether AI Corporation Computational memory
US12124530B2 (en) * 2019-03-11 2024-10-22 Untether Ai Corporation Computational memory
US10817291B2 (en) 2019-03-30 2020-10-27 Intel Corporation Apparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator
US10965536B2 (en) 2019-03-30 2021-03-30 Intel Corporation Methods and apparatus to insert buffers in a dataflow graph
US10915471B2 (en) 2019-03-30 2021-02-09 Intel Corporation Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator
US11029927B2 (en) 2019-03-30 2021-06-08 Intel Corporation Methods and apparatus to detect and annotate backedges in a dataflow graph
US11037050B2 (en) 2019-06-29 2021-06-15 Intel Corporation Apparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator
US11342944B2 (en) 2019-09-23 2022-05-24 Untether Ai Corporation Computational memory with zero disable and error detection
US11907713B2 (en) 2019-12-28 2024-02-20 Intel Corporation Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator
US11468002B2 (en) * 2020-02-28 2022-10-11 Untether Ai Corporation Computational memory with cooperation among rows of processing elements and memory thereof
US12086080B2 (en) 2020-09-26 2024-09-10 Intel Corporation Apparatuses, methods, and systems for a configurable accelerator having dataflow execution circuits
CN112967172B (en) * 2021-02-26 2024-09-17 成都商汤科技有限公司 Data processing device, method, computer equipment and storage medium
US20250245181A1 (en) * 2024-01-30 2025-07-31 Google Llc System and Methods for Multi-Pod Inter-Chip Interconnect

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS608970A (en) * 1983-06-29 1985-01-17 Fuji Electric Co Ltd Multi-controller system
JPS60186151A (en) * 1984-03-05 1985-09-21 Matsushita Electric Ind Co Ltd Data communicating method between processors
CA2129882A1 (en) * 1993-08-12 1995-02-13 Soheil Shams Dynamically reconfigurable interprocessor communication network for simd multiprocessors and apparatus implementing same
US5842034A (en) * 1996-12-20 1998-11-24 Raytheon Company Two dimensional crossbar mesh for multi-processor interconnect
US5978379A (en) * 1997-01-23 1999-11-02 Gadzoox Networks, Inc. Fiber channel learning bridge, learning half bridge, and protocol
US6366999B1 (en) * 1998-01-28 2002-04-02 Bops, Inc. Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution
US6041400A (en) * 1998-10-26 2000-03-21 Sony Corporation Distributed extensible processing architecture for digital signal processing applications
JP3674515B2 (en) * 2000-02-25 2005-07-20 日本電気株式会社 Array type processor
US7006521B2 (en) * 2000-11-15 2006-02-28 Texas Instruments Inc. External bus arbitration technique for multicore DSP device
US7233998B2 (en) * 2001-03-22 2007-06-19 Sony Computer Entertainment Inc. Computer architecture and software cells for broadband networks
US7093104B2 (en) * 2001-03-22 2006-08-15 Sony Computer Entertainment Inc. Processing modules for computer architecture for broadband networks
US6526491B2 (en) * 2001-03-22 2003-02-25 Sony Corporation Entertainment Inc. Memory protection system and method for computer architecture for broadband networks
US6809734B2 (en) * 2001-03-22 2004-10-26 Sony Computer Entertainment Inc. Resource dedication system and method for a computer architecture for broadband networks
US7516334B2 (en) * 2001-03-22 2009-04-07 Sony Computer Entertainment Inc. Power management for processing modules
US6826662B2 (en) * 2001-03-22 2004-11-30 Sony Computer Entertainment Inc. System and method for data synchronization for a computer architecture for broadband networks
US7231500B2 (en) * 2001-03-22 2007-06-12 Sony Computer Entertainment Inc. External data interface in a computer architecture for broadband networks
US7152151B2 (en) * 2002-07-18 2006-12-19 Ge Fanuc Embedded Systems, Inc. Signal processing resource for selective series processing of data in transit on communications paths in multi-processor arrangements
US20020184291A1 (en) * 2001-05-31 2002-12-05 Hogenauer Eugene B. Method and system for scheduling in an adaptable computing engine
US20040022094A1 (en) * 2002-02-25 2004-02-05 Sivakumar Radhakrishnan Cache usage for concurrent multiple streams
US7124211B2 (en) * 2002-10-23 2006-10-17 Src Computers, Inc. System and method for explicit communication of messages between processes running on different nodes in a clustered multiprocessor system
US7093079B2 (en) * 2002-12-17 2006-08-15 Intel Corporation Snoop filter bypass
JP4423953B2 (en) * 2003-07-09 2010-03-03 株式会社日立製作所 Semiconductor integrated circuit
JP4359490B2 (en) * 2003-11-28 2009-11-04 アイピーフレックス株式会社 Data transmission method
US20080162877A1 (en) * 2005-02-24 2008-07-03 Erik Richter Altman Non-Homogeneous Multi-Processor System With Shared Memory

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620588B (en) * 2008-07-03 2011-01-19 中国人民解放军信息工程大学 A Connection and Management Method of Reconfigurable Components in High Performance Computer
CN101727434B (en) * 2008-10-20 2012-06-13 北京大学深圳研究生院 Integrated circuit structure special for specific application algorithm
WO2017177928A1 (en) * 2016-04-12 2017-10-19 Huawei Technologies Co., Ltd. Scalable autonomic message-transport with synchronization
US10185606B2 (en) 2016-04-12 2019-01-22 Futurewei Technologies, Inc. Scalable autonomic message-transport with synchronization
US10289598B2 (en) 2016-04-12 2019-05-14 Futurewei Technologies, Inc. Non-blocking network

Also Published As

Publication number Publication date
CN100414535C (en) 2008-08-27
JP4536618B2 (en) 2010-09-01
US20070033369A1 (en) 2007-02-08
JP2007041781A (en) 2007-02-15

Similar Documents

Publication Publication Date Title
CN1908927A (en) Reconfigurable integrated circuit device
JP4391935B2 (en) Processing system with interspersed processors and communication elements
CN107273093B (en) Scalable Computing Fabric
CN1526100A (en) integrated circuit device
CN111274025A (en) System and method for accelerating data processing in SSD
CN112486908B (en) Hierarchical multi-RPU multi-PEA reconfigurable processor
CN1716227A (en) Operating means and operation apparatus control method, program and computer-readable medium
JP2005044361A (en) Self-contained processor subsystem as component for system-on-chip design
TWI666551B (en) Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US20250251967A1 (en) Multiple contexts for a compute unit in a reconfigurable data processor
WO2023076521A1 (en) Force-quit for reconfigurable processors
TWI668574B (en) Computing apparatus, system-on-chip and method of quality of service ordinal modification
US8190856B2 (en) Data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled
US8843728B2 (en) Processor for enabling inter-sequencer communication following lock competition and accelerator registration
WO2022088171A1 (en) Neural processing unit synchronization systems and methods
Hussain et al. Pgc: a pattern-based graphics controller
Eisenhardt et al. Optimizing partial reconfiguration of multi-context architectures
CN106201931B (en) A kind of hypervelocity matrix operation coprocessor system
CN111209230B (en) Data processing device, method and related products
CN105718421A (en) Data caching updating system for multiple coarseness dynamically-reconfigurable arrays
US20250208907A1 (en) Controller for an array of data processing engines
CN1639690A (en) Semiconductor device
US20250370941A1 (en) Dma strategies for aie control and configuration
WO2026084584A1 (en) Processor system for performing neural network computations
WO2015123848A1 (en) Reconfigurable processor and conditional execution method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: FUJITSU MICROELECTRONICS CO., LTD.

Free format text: FORMER OWNER: FUJITSU LIMITED

Effective date: 20081024

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20081024

Address after: Tokyo, Japan, Japan

Patentee after: Fujitsu Microelectronics Ltd.

Address before: Kanagawa

Patentee before: Fujitsu Ltd.

C56 Change in the name or address of the patentee

Owner name: FUJITSU SEMICONDUCTORS CO., LTD

Free format text: FORMER NAME: FUJITSU MICROELECTRON CO., LTD.

CP03 Change of name, title or address

Address after: Kanagawa

Patentee after: Fujitsu Semiconductor Co., Ltd.

Address before: Tokyo, Japan, Japan

Patentee before: Fujitsu Microelectronics Ltd.

ASS Succession or assignment of patent right

Owner name: SPANSION LLC N. D. GES D. STAATES

Free format text: FORMER OWNER: FUJITSU SEMICONDUCTOR CO., LTD.

Effective date: 20140102

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20140102

Address after: American California

Patentee after: Spansion LLC N. D. Ges D. Staates

Address before: Kanagawa

Patentee before: Fujitsu Semiconductor Co., Ltd.

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20160408

Address after: American California

Patentee after: Cypress Semiconductor Corp.

Address before: American California

Patentee before: Spansion LLC N. D. Ges D. Staates

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080827

Termination date: 20170217

CF01 Termination of patent right due to non-payment of annual fee