CN110825435A - Method and apparatus for processing data - Google Patents
Method and apparatus for processing data Download PDFInfo
- Publication number
- CN110825435A CN110825435A CN201810910200.9A CN201810910200A CN110825435A CN 110825435 A CN110825435 A CN 110825435A CN 201810910200 A CN201810910200 A CN 201810910200A CN 110825435 A CN110825435 A CN 110825435A
- Authority
- CN
- China
- Prior art keywords
- data
- address
- memory
- operand
- executed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
本申请实施例公开了用于处理数据的方法和装置。该方法的一具体实施方式包括:获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数。该实施方式提高了存储器的存储利用率。
The embodiments of the present application disclose methods and apparatuses for processing data. A specific implementation of the method includes: obtaining the instruction to be executed, the instruction to be executed includes a data identifier; decoding the instruction to be executed to obtain the first address in the memory of the operand indicated by the data identifier; The first address is aligned with the data bit width to obtain the second address; the operand is read/written in the memory according to the second address. This embodiment improves the storage utilization of the memory.
Description
技术领域technical field
本申请实施例涉及计算机技术领域,尤其涉及用于处理数据的方法和装置。The embodiments of the present application relate to the field of computer technologies, and in particular, to methods and apparatuses for processing data.
背景技术Background technique
近年来,随着以深度学习为代表的模型算法的兴起和发展,神经网络模型已经广泛应用于各个领域,例如,语音识别、图像识别、自然语言处理等领域。神经网络模型中存在大量的计算密集型算子,例如,矩阵计算、卷积、池化、激活、标准化等等。由于这些运算非常耗时,传统CPU(Central Processing Unit,中央处理单元)的运算能力难以满足需求,从而使得异构运算成为主流。并且因此开发出了各种神经网络专用处理器,诸如,GPU(GraphicsProcessing Unit,图形处理器)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、ASIC(Application Specific Integrated Circuits,专用集成电路)等神经网络专用处理器。In recent years, with the rise and development of model algorithms represented by deep learning, neural network models have been widely used in various fields, such as speech recognition, image recognition, natural language processing and other fields. There are a large number of computationally intensive operators in neural network models, such as matrix computation, convolution, pooling, activation, normalization, and more. Since these operations are very time-consuming, the computing power of a traditional CPU (Central Processing Unit, central processing unit) cannot meet the requirements, so that heterogeneous computing becomes the mainstream. And as a result, various neural network dedicated processors have been developed, such as GPU (GraphicsProcessing Unit, graphics processor), FPGA (Field-Programmable Gate Array, Field Programmable Gate Array), ASIC (Application Specific Integrated Circuits, application specific integrated circuit) ) and other special-purpose processors for neural networks.
目前,在通用处理器或者是针对一些有计算密集型特点的例如深度学习领域的处理器中,访问存储器的地址对软件编程人员来说有很强的限制,例如,在总线数据位宽为64byte的设置下,访存地址需要64byte对齐。出现非对齐地址的情况下,硬件会自动忽略非对齐部分,返回错误数据,并向软件报告中断异常。为了保证存储对齐限制下访问数据的正确性,需要对不规则的数据结构进行无效数据的填充。At present, in general-purpose processors or processors for some computing-intensive features such as deep learning, the address of accessing memory has strong restrictions on software programmers. For example, when the bus data bit width is 64 bytes Under the setting of , the fetch address needs to be 64byte aligned. In the case of an unaligned address, the hardware will automatically ignore the unaligned part, return incorrect data, and report an interrupt exception to the software. In order to ensure the correctness of accessing data under the limitation of storage alignment, it is necessary to fill the irregular data structure with invalid data.
发明内容SUMMARY OF THE INVENTION
本申请实施例提出了用于处理数据的方法和装置。The embodiments of the present application propose methods and apparatuses for processing data.
第一方面,本申请实施例提供了一种用于处理数据的方法,该方法包括:获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数。In a first aspect, an embodiment of the present application provides a method for processing data, the method includes: obtaining an instruction to be executed, where the instruction to be executed includes a data identifier; decoding the instruction to be executed to obtain an operand indicated by the data identifier The first address in the memory; the first address is aligned according to the preset data bit width of the memory to obtain the second address; the operand is read/written in the memory according to the second address.
在一些实施例中,根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址之后,方法还包括:计算对齐操作的移位信息。In some embodiments, after an alignment operation is performed on the first address according to a preset data bit width of the memory to obtain the second address, the method further includes: calculating shift information of the alignment operation.
在一些实施例中,根据第二地址在存储器中读/写操作数,包括:向存储器发送数据读取请求,数据读取请求包括第二地址;获取存储器响应于接收到数据读取请求返回的第一数据;根据移位信息对第一数据进行移位操作得到操作数;输出操作数。In some embodiments, reading/writing the operand in the memory according to the second address includes: sending a data read request to the memory, the data read request including the second address; obtaining the data returned by the memory in response to receiving the data read request the first data; perform a shift operation on the first data according to the shift information to obtain an operand; and output the operand.
在一些实施例中,根据移位信息对第一数据进行移位操作得到操作数,包括:根据移位信息对第一数据进行移位操作得到第二数据;根据预先定义的操作数的数据长度截取第二数据得到操作数。In some embodiments, performing a shift operation on the first data according to the shift information to obtain the operand includes: performing a shift operation on the first data according to the shift information to obtain the second data; according to a predefined data length of the operand Intercept the second data to obtain the operand.
在一些实施例中,根据第二地址在存储器中读/写操作数,包括:根据第一地址与预先定义的操作数的数据长度,确定第二地址所指示的待写入数据的有效位;将待写入数据中有效位上的数据写入存储器。In some embodiments, reading/writing the operand in the memory according to the second address includes: determining the valid bits of the data to be written indicated by the second address according to the first address and the data length of the pre-defined operand; Writes the data on the valid bits of the data to be written into memory.
第二方面,本申请实施例提供了一种用于处理数据的装置,该装置包括:获取单元,被配置成获取待执行指令,待执行指令包括数据标识;译码单元,被配置成对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;对齐单元,被配置成根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;读写单元,被配置成根据第二地址在存储器中读/写操作数。In a second aspect, an embodiment of the present application provides an apparatus for processing data, the apparatus comprising: an acquisition unit configured to acquire an instruction to be executed, the instruction to be executed includes a data identifier; a decoding unit configured to be executed The instruction is decoded to obtain the first address in the memory of the operand indicated by the data identifier; the alignment unit is configured to perform an alignment operation on the first address according to the preset data bit width of the memory to obtain the second address; read A write unit configured to read/write the operand in the memory according to the second address.
在一些实施例中,装置还包括:计算单元,被配置成计算对齐操作的移位信息。In some embodiments, the apparatus further includes a calculation unit configured to calculate shift information for the alignment operation.
在一些实施例中,读写单元,包括:发送子单元,被配置成向存储器发送数据读取请求,数据读取请求包括第二地址;获取子单元,被配置成获取存储器响应于接收到数据读取请求返回的第一数据;移位子单元,被配置成根据移位信息对第一数据进行移位操作得到操作数;输出子单元,被配置成输出操作数。In some embodiments, the read/write unit includes: a sending subunit configured to send a data read request to the memory, the data read request including the second address; an obtaining subunit configured to obtain the memory in response to receiving the data The first data returned by the read request; the shift subunit is configured to perform a shift operation on the first data according to the shift information to obtain an operand; and the output subunit is configured to output the operand.
在一些实施例中,移位子单元,进一步配置用于:根据移位信息对第一数据进行移位操作得到第二数据;根据预先定义的操作数的数据长度截取第二数据得到操作数。In some embodiments, the shift subunit is further configured to: perform a shift operation on the first data according to the shift information to obtain the second data; and truncate the second data according to the data length of the pre-defined operand to obtain the operand.
在一些实施例中,读写单元,包括:确定子单元,被配置成根据第一地址与预先定义的操作数的数据长度,确定第二地址所指示的待写入数据的有效位;写入子单元,被配置成将待写入数据中有效位上的数据写入存储器。In some embodiments, the read-write unit includes: a determination sub-unit configured to determine the valid bits of the data to be written indicated by the second address according to the first address and the data length of the pre-defined operand; write The subunit is configured to write the data on the valid bits of the data to be written into the memory.
第三方面,本申请实施例提供了一种人工智能芯片,包括:一个或多个处理器核;存储装置,其上存储有一个或多个程序;当一个或多个程序被一个或多个处理器核执行时,使得一个或多个处理器核实现如第一方面上述的方法。In a third aspect, embodiments of the present application provide an artificial intelligence chip, including: one or more processor cores; a storage device on which one or more programs are stored; when one or more programs are stored by one or more When executed by the processor cores, one or more processor cores are caused to implement the method described above in the first aspect.
第四方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被人工智能芯片执行时实现如第一方面上述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable medium on which a computer program is stored, and when the program is executed by an artificial intelligence chip, the method as described in the first aspect is implemented.
第五方面,本申请实施例提供了一种电子设备,包括:处理器、存储装置和至少一个如第三方面上述的人工智能芯片。In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage device, and at least one artificial intelligence chip as described in the third aspect.
本申请实施例提供的用于处理数据的方法和装置,通过获取待执行指令,而后对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址,并根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址,最后根据第二地址在存储器中读/写操作数,提高了存储器的存储利用率。The method and device for processing data provided by the embodiments of the present application obtain the first address in the memory of the operand indicated by the data identifier by acquiring the instruction to be executed, and then decoding the instruction to be executed, and according to the preset The data bit width of the memory aligns the first address to obtain the second address, and finally reads/writes the operand in the memory according to the second address, thereby improving the storage utilization of the memory.
附图说明Description of drawings
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application may be applied;
图2是根据本申请的用于处理数据的方法的一个实施例的流程图;Figure 2 is a flowchart of one embodiment of a method for processing data according to the present application;
图3A是根据本申请的用于处理数据的方法的一个实施例中写入待写入数据前存储器中起始地址为第二地址的存储空间的一个示意图;3A is a schematic diagram of a storage space whose starting address is the second address in the memory before writing the data to be written in an embodiment of the method for processing data according to the present application;
图3B是根据本申请的用于处理数据的方法的一个实施例中起始地址为第二地址的待写入数据的一个示意图;3B is a schematic diagram of data to be written whose starting address is the second address in an embodiment of the method for processing data according to the present application;
图3C是根据本申请的用于处理数据的方法的一个实施例中用于指示待写入数据中有效位的信号的一个示意图;3C is a schematic diagram of a signal for indicating a valid bit in the data to be written in an embodiment of the method for processing data according to the present application;
图3D是根据本申请的用于处理数据的方法的一个实施例中写入待写入数据后存储器中起始地址为第二地址的存储空间的一个示意图;3D is a schematic diagram of a storage space whose starting address is the second address in the memory after writing the data to be written in an embodiment of the method for processing data according to the present application;
图4是根据本申请的用于处理数据的方法的又一个实施例的流程图;Figure 4 is a flowchart of yet another embodiment of a method for processing data according to the present application;
图5A是根据本申请的用于处理数据的方法的一个实施例中存储器返回的起始地址为第二地址的存储空间中第一数据的一个示意图;5A is a schematic diagram of the first data in the storage space where the starting address returned by the memory is the second address in an embodiment of the method for processing data according to the present application;
图5B是根据本申请的用于处理数据的方法的一个实施例中进行移位以及截取操作得到的操作数的一个示意图;5B is a schematic diagram of operands obtained by shifting and truncating operations in an embodiment of the method for processing data according to the present application;
图6是根据本申请的用于处理数据的装置的一个实施例的结构示意图;6 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present application;
图7是根据本申请的电子设备的一个实施例的计算机系统的结构示意图。FIG. 7 is a schematic structural diagram of a computer system according to an embodiment of the electronic device of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
图1示出了可以应用本申请的用于处理数据的方法或用于处理数据的装置的实施例的示例性系统架构100。FIG. 1 illustrates an
如图1所示,系统架构100可以包括CPU(Central Processing Unit,中央处理器)101,总线102和AI(Artificial Intelligence)芯片103。总线102用以在CPU 101和AI芯片103之间提供通信链路的介质。总线102可以包括各种总线类型,例如PCIE(peripheralcomponent interconnect express)总线,AMBA(Advanced Microcontroller BusArchitecture)总线、OCP(Open Core Protocol)总线等等。As shown in FIG. 1 , the
AI芯片,即人工智能芯片,也被称为AI加速器或计算卡,是指专门用于处理人工智能应用中的大量计算任务的模块(其他非计算任务仍由CPU负责)。AI计算中运算的需求是巨大的,特别是复杂的运算需求对计算性能影响较大。复杂运算,虽然可以用基本运算指令来实现,但会降低复杂运算(如浮点开方运算、浮点求幂运算、三角函数运算等)执行效率。AI chip, that is, artificial intelligence chip, also known as AI accelerator or computing card, refers to a module dedicated to processing a large number of computing tasks in artificial intelligence applications (other non-computing tasks are still handled by the CPU). The computing requirements in AI computing are huge, especially complex computing requirements have a greater impact on computing performance. Although complex operations can be implemented with basic operation instructions, the execution efficiency of complex operations (such as floating-point square root operations, floating-point exponentiation operations, trigonometric function operations, etc.) will be reduced.
AI芯片103可以包括处理器核1031、1032、1033,总线1034和存储器1035。总线1034用于在处理器核1031、1032、1033和存储器1035之间提供通信链路的介质。总线1034可以包括各种总线类型,例如PCI总线、PCIE总线、支持片上互连(Network On Chip)协议的AMBA总线、OCP总线以及其他片上互连总线等等。The
需要说明的是,本申请实施例所提供的用于处理数据的方法可以由AI芯片103执行,相应的用于处理数据的装置可以设置于AI芯片中。It should be noted that the method for processing data provided by the embodiments of the present application may be executed by the
应该理解,图1中的CPU、总线、AI芯片的数目仅仅是示意性的。根据实现需要,可以具有任意数目的CPU、总线和AI芯片。同理,AI芯片103中处理器核、总线和存储器的数目也仅仅是示意性的。根据实现需要,AI芯片103中可以具有任意数目的处理器核、总线和存储器。另外,根据实现需要,系统架构100中还可以包括输入设备(比如鼠标、键盘等)、输出设备(比如显示器、扬声器等)、输入/输出接口等等。It should be understood that the numbers of CPUs, buses, and AI chips in FIG. 1 are only illustrative. There can be any number of CPUs, buses, and AI chips according to implementation needs. Similarly, the numbers of processor cores, buses and memories in the
继续参考图2,示出了根据本申请的用于处理数据的方法的一个实施例的流程200。该用于处理数据的方法,包括以下步骤:With continued reference to Figure 2, a flow 200 of one embodiment of a method for processing data according to the present application is shown. The method for processing data includes the following steps:
步骤201,获取待执行指令。Step 201, acquiring the instruction to be executed.
在本实施例中,用于处理数据的方法执行主体(例如图1所示的AI芯片)可以首先获取待执行指令,待执行指令包括数据标识。人工智能芯片可以与CPU通信连接,作为示例,人工智能芯片可以通过PCIE与CPU通信连接,以从CPU读取待执行指令。待执行指令包括数据标识,数据标识可以是数据名等可以标识操作数的信息。操作数可以是待对其进行处理的数据,例如待读取或待写入的数据。In this embodiment, the method execution body for processing data (for example, the AI chip shown in FIG. 1 ) may first acquire the to-be-executed instruction, and the to-be-executed instruction includes a data identifier. The artificial intelligence chip may be communicatively connected with the CPU. As an example, the artificial intelligence chip may be communicatively connected with the CPU through PCIE to read instructions to be executed from the CPU. The instruction to be executed includes a data identifier, and the data identifier may be information that can identify an operand, such as a data name. An operand can be the data to be processed, such as data to be read or written.
步骤202,对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址。Step 202: Decode the instruction to be executed to obtain the first address in the memory of the operand indicated by the data identifier.
在本实施例中,上述执行主体可以对步骤201中获取的待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址,第一地址可以是定义操作数时为其分配的地址。第一地址可以包括操作数在存储器中的首地址,也可以包括操作数在存储器中地址段,第一地址为操作数在存储器中的首地址时,第一地址指示的存储空间可以包括以第一地址为首地址的存储空间,存储空间的大小可以是预先设置的,也可以根据操作数长度决定。In this embodiment, the above-mentioned execution body may decode the to-be-executed instruction obtained in step 201 to obtain the first address in the memory of the operand indicated by the data identifier, and the first address may be the operand when the operand is defined. assigned address. The first address may include the first address of the operand in the memory, or may include the address segment of the operand in the memory. When the first address is the first address of the operand in the memory, the storage space indicated by the first address may include the first address. An address is the storage space of the first address, and the size of the storage space can be preset or determined according to the length of the operand.
这里,人工智能芯片的存储器可以包括以下至少一项:静态随机存取存储器(SRAM,Static Random-Access Memory)、动态随机存取存储器(DRAM,Dynamic RandomAccess Memory)和Flash存储器(Flash Memory)。Here, the memory of the artificial intelligence chip may include at least one of the following: static random access memory (SRAM, Static Random-Access Memory), dynamic random access memory (DRAM, Dynamic RandomAccess Memory), and Flash memory (Flash Memory).
根据存储层次,存储器可以分为:寄存器、局部存储(Local Memory,LM),共享存储(Share Memory,SM),全局存储(Global Memory,GM),作为示例,数据标识所指示的操作数可以存储于全局存储(Global Memory,GM)。According to the storage level, the memory can be divided into: register, local memory (LM), shared memory (Share Memory, SM), global memory (Global Memory, GM), as an example, the operand indicated by the data identifier can be stored in Global Memory (GM).
这里,上述执行主体可以在获取到待执行指令的情况下,可以在至少一个处理器核中选取执行待执行指令的处理器核作为目标处理器核。例如,可以根据每个处理器核的当前工作状态在至少一个处理器核中选取执行待执行指令的处理器核作为目标处理器核。又例如,也可以按照轮询的方式在至少一个处理器核中选取执行待执行指令的处理器核作为目标处理器核。Here, the above-mentioned execution body may select, from at least one processor core, a processor core that executes the to-be-executed instruction as a target processor core when the instruction to be executed is acquired. For example, the processor core that executes the instruction to be executed may be selected from at least one processor core as the target processor core according to the current working state of each processor core. For another example, the processor core that executes the instruction to be executed may be selected from at least one processor core as the target processor core in a polling manner.
这样,目标处理器核可以在获取到待执行指令的情况下,对待执行指令进行译码,得到操作码和地址。译码指的是按预定的指令格式对获取的待执行指令进行拆分和解译。操作码表示要执行的操作性质,即执行什么操作,或做什么。地址是操作码执行时的操作数的地址。计算机执行一条指定的指令时,必须首先分析这条指令的操作码是什么,以决定操作的性质和方法,然后才能控制计算机其他各部件协同完成指令表达的功能。In this way, the target processor core can decode the to-be-executed instruction to obtain the operation code and the address when the to-be-executed instruction is obtained. Decoding refers to splitting and interpreting the fetched instruction to be executed according to a predetermined instruction format. The opcode indicates the nature of the operation to be performed, that is, what to do, or what to do. The address is the address of the operand when the opcode is executed. When a computer executes a specified instruction, it must first analyze the operation code of the instruction to determine the nature and method of the operation, and then control the other components of the computer to cooperate to complete the function expressed by the instruction.
步骤203,根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址。Step 203: Perform an alignment operation on the first address according to the preset data bit width of the memory to obtain the second address.
在本实施例中,上述执行主体可以根据预先设置的存储器的数据位宽对步骤202中得到的第一地址进行对齐操作,得到第二地址。第二地址也可以包括操作数在存储器中的首地址,也可以包括操作数在存储器中地址段。In this embodiment, the above-mentioned execution body may perform an alignment operation on the first address obtained in step 202 according to the preset data bit width of the memory to obtain the second address. The second address may also include the first address of the operand in the memory, and may also include the address segment of the operand in the memory.
数据位宽可以是存储器一次能传输的数据量,即一次能传递的数据宽度。作为示例,AI芯片中全局存储可以采用高带宽显存(High Bandwidth Memory,HBM)技术,访存接口可以使用标准AXI(Advanced eXtensible Interface)总线协议,由于HBM本身接口的特性和AXI总线协议的限制,访问内存的地址对软件编程人员来说有很强的限制,例如:在AXI总线数据位宽为64byte的设置下,访存地址需要64byte对齐。虽然AXI总线协议支持传输的粒度是1byte,但是在同样的时钟频率条件下,数据位宽和访存带宽成正比,为了提高AI芯片的访存带宽,避免访存瓶颈,AXI总线位宽设置为64byte,从而引发访存地址的对齐限制。The data bit width can be the amount of data that the memory can transmit at one time, that is, the data width that can be transmitted at one time. As an example, the global storage in the AI chip can use High Bandwidth Memory (HBM) technology, and the memory access interface can use the standard AXI (Advanced eXtensible Interface) bus protocol. Due to the characteristics of the HBM itself and the limitations of the AXI bus protocol, The address of accessing the memory has strong restrictions for software programmers. For example, when the data bit width of the AXI bus is 64 bytes, the address of accessing memory needs to be aligned with 64 bytes. Although the transmission granularity supported by the AXI bus protocol is 1 byte, under the same clock frequency, the data bit width is proportional to the memory access bandwidth. In order to improve the memory access bandwidth of the AI chip and avoid memory access bottlenecks, the AXI bus bit width is set to 64byte, thus causing the alignment limit of the fetch address.
在这里,对齐操作可以是向下对齐,根据预先设置的存储器的数据位宽对第一地址进行对齐操作,可以是将第一地址对齐到在它之前的一个数据位宽的起始位置,例如可以将第一地址0x21000004对齐变成第二地址0x21000000。Here, the alignment operation may be downward alignment, and the alignment operation is performed on the first address according to the preset data bit width of the memory, which may be to align the first address to the starting position of a data bit width before it, for example The first address 0x21000004 can be aligned to become the second address 0x21000000.
步骤204,根据第二地址在存储器中读/写操作数。Step 204, read/write the operand in the memory according to the second address.
在本实施例中,上述执行主体可以根据步骤203中得到的第二地址在存储器中读/写操作数。执行主体可以读取第二地址指示的存储空间中的数据,或将操作数写入第二地址指示的存储空间。In this embodiment, the above-mentioned execution body may read/write the operand in the memory according to the second address obtained in step 203 . The execution body can read the data in the storage space indicated by the second address, or write the operand into the storage space indicated by the second address.
在本实施例的一些可选实现方式中,根据第二地址在存储器中读/写操作数,包括:根据第一地址与预先定义的操作数的数据长度,确定第二地址所指示的待写入数据的有效位;将待写入数据中有效位上的数据写入存储器。作为示例,在AXI数据通道传输可以通过wstrb信号实现将待写入数据中有效位上的数据写入存储器,wrtrb信号用于指示数据有效位是哪些。In some optional implementations of this embodiment, reading/writing the operand in the memory according to the second address includes: determining the to-be-written indicated by the second address according to the first address and the data length of the pre-defined operand Enter the valid bit of the data; write the data on the valid bit of the data to be written into the memory. As an example, in the AXI data channel transmission, the data on the valid bits in the data to be written may be written into the memory through the wstrb signal, and the wrtrb signal is used to indicate which are the valid bits of the data.
在本实现方式中,操作数的数据长度可以是在定义操作数时确定的,作为示例,操作数是数组A,定义的数据类型是浮点32bit,数组长度8,数组A的数据长度为32byte。In this implementation manner, the data length of the operand may be determined when the operand is defined. As an example, the operand is an array A, the defined data type is floating point 32bit, the array length is 8, and the data length of the array A is 32bytes .
如图3A所示,数组A写入存储器之前,存储器中起始地址为第二地址的存储空间中存储有数组C与数组B。如图3B所示,起始地址为第二地址的待写入数据中操作数即数组A的起始地址实际为第一地址,根据第一地址与预先定义的操作数的数据长度,可以确定第二地址所指示的待写入数据的有效位,用于指示有效位的信号可以如图3C所示。而后可以将待写入数据中有效位上的数据写入存储器中,存储器起始地址为第二地址的存储空间中写入操作数数组A后的结果如图3D所示。As shown in FIG. 3A , before the array A is written into the memory, the array C and the array B are stored in the storage space whose starting address is the second address in the memory. As shown in FIG. 3B , the operand in the data to be written whose starting address is the second address, that is, the starting address of the array A is actually the first address. According to the first address and the data length of the pre-defined operand, it can be determined The valid bit of the data to be written indicated by the second address, the signal used to indicate the valid bit may be as shown in FIG. 3C . Then, the data on the valid bits in the data to be written can be written into the memory, and the result after writing the operand array A in the storage space where the starting address of the memory is the second address is shown in FIG. 3D .
本申请的上述实施例提供的方法通过获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数,避免了无效数据的填充,提高了存储器的存储利用率。The method provided by the above-mentioned embodiments of the present application obtains the to-be-executed instruction, which includes a data identifier; decodes the to-be-executed instruction to obtain the first address in the memory of the operand indicated by the data identifier; according to the preset memory Aligning the first address to obtain the second address; reading/writing operands in the memory according to the second address, avoiding filling of invalid data and improving the storage utilization of the memory.
进一步参考图4,其示出了用于处理数据的方法的又一个实施例的流程400。该用于处理数据的方法的流程400,包括以下步骤:With further reference to Figure 4, a
步骤401,获取待执行指令。In
在本实施例中,用于处理数据的方法执行主体(例如图1所示的AI芯片)可以首先获取待执行指令,待执行指令包括数据标识。In this embodiment, the method execution body for processing data (for example, the AI chip shown in FIG. 1 ) may first acquire the to-be-executed instruction, and the to-be-executed instruction includes a data identifier.
步骤402,对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址。Step 402: Decode the instruction to be executed to obtain the first address in the memory of the operand indicated by the data identifier.
在本实施例中,上述执行主体可以对步骤401中获取的待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址。In this embodiment, the above-mentioned execution body may decode the to-be-executed instruction obtained in
步骤403,根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址。Step 403: Perform an alignment operation on the first address according to the preset data bit width of the memory to obtain the second address.
在本实施例中,上述执行主体可以根据预先设置的存储器的数据位宽对步骤402中得到的第一地址进行对齐操作,得到第二地址。In this embodiment, the above-mentioned execution body may perform an alignment operation on the first address obtained in
步骤404,计算对齐操作的移位信息。
在本实施例中,上述执行主体可以计算步骤403中对齐操作的移位信息。移位信息可以包括移位数和移位方向。作为示例,将地址0x21000004对齐变成0x21000000,则移位数可以为4。In this embodiment, the above-mentioned executive body may calculate the shift information of the alignment operation in
步骤405,向存储器发送数据读取请求,数据读取请求包括第二地址。Step 405: Send a data read request to the memory, where the data read request includes the second address.
在本实施例中,上述执行主体可以向存储器发送数据读取请求,数据读取请求包括步骤403中得到的第二地址。由于第二地址为对齐后的地址,存储器可以根据第二地址返回正确的数据。In this embodiment, the above-mentioned execution body may send a data read request to the memory, where the data read request includes the second address obtained in
步骤406,获取存储器响应于接收到数据读取请求返回的第一数据。Step 406: Obtain the first data returned by the memory in response to receiving the data read request.
在本实施例中,上述执行主体可以获取存储器响应于接收到步骤405中发送的数据读取请求返回的第一数据。第一数据中包括所需的操作数。In this embodiment, the above-mentioned execution body may acquire the first data returned by the memory in response to receiving the data read request sent in
步骤407,根据移位信息对第一数据进行移位操作得到操作数。Step 407: Perform a shift operation on the first data according to the shift information to obtain an operand.
在本实施例中,上述执行主体可以根据步骤404中计算出的移位信息对第一数据进行移位操作得到操作数。由于第二地址是第一地址进行对齐操作后的地址,第二地址指示的数据移位后为第一地址指示的数据,即包括操作数的数据。In this embodiment, the above-mentioned execution body may perform a shift operation on the first data according to the shift information calculated in
在本实施例的一些可选实现方式中,根据移位信息对第一数据进行移位操作得到操作数,包括:根据移位信息对第一数据进行移位操作得到第二数据;根据预先定义的操作数的数据长度截取第二数据得到操作数。本实现方式通过预先定义的操作数的数据长度截取第二数据,得到操作数,进一步提高了输出的操作数的精准度,后续使用存储器输出数据的模块可以直接使用,不用进一步处理。In some optional implementation manners of this embodiment, performing a shift operation on the first data according to the shift information to obtain the operand includes: performing a shift operation on the first data according to the shift information to obtain the second data; The data length of the operand intercepts the second data to obtain the operand. This implementation method intercepts the second data by the data length of the pre-defined operand to obtain the operand, which further improves the accuracy of the output operand, and the module that uses the memory to output the data can be used directly without further processing.
作为示例,存储器返回的起始地址为第二地址的存储空间中的第一数据如图5A所示,针对第一数据,根据移位信息进行移位操作,并根据预先定义的操作数的数据长度进行截取,得到如图5B所示的操作数。As an example, the first data in the storage space where the starting address returned by the memory is the second address is shown in FIG. 5A . For the first data, the shift operation is performed according to the shift information, and the data of the pre-defined operand is performed. The length is truncated to obtain the operand as shown in FIG. 5B .
步骤408,输出操作数。
在本实施例中,上述执行主体可以输出步骤407中得到的操作数。In this embodiment, the above-mentioned execution body may output the operand obtained in
在本实施例中,步骤401、步骤402、步骤403的操作与步骤201、步骤202、步骤203的操作基本相同,在此不再赘述。In this embodiment, the operations of
从图4中可以看出,与图2对应的实施例相比,本实施例中的用于处理数据的方法的流程400中通过计算对齐操作的移位信息,并根据移位信息对第一数据进行移位操作,由此,本实施例描述的方案中输出了更为精准的操作数,便于读取存储器数据的模块使用输出的操作数。It can be seen from FIG. 4 that, compared with the embodiment corresponding to FIG. 2 , in the
进一步参考图6,作为对上述各图所示方法的实现,本申请提供了一种用于处理数据的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 6 , as an implementation of the methods shown in the above figures, the present application provides an embodiment of an apparatus for processing data. The apparatus embodiment corresponds to the method embodiment shown in FIG. 2 . The device can be specifically applied to various electronic devices.
如图6所示,本实施例的用于处理数据的装置600包括:获取单元601、译码单元602、对齐单元603、读写单元604。其中,获取单元,被配置成获取待执行指令,待执行指令包括数据标识;译码单元,被配置成对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;对齐单元,被配置成根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;读写单元,被配置成根据第二地址在存储器中读/写操作数。As shown in FIG. 6 , the
在本实施例中,用于处理数据的装置600的获取单元601、译码单元602、对齐单元603、读写单元604的具体处理可以参考图2对应实施例中的步骤201、步骤202、步骤203和步骤204。In this embodiment, for the specific processing of the acquiring
在本实施例的一些可选实现方式中,装置还包括:计算单元,被配置成计算对齐操作的移位信息。In some optional implementations of this embodiment, the apparatus further includes: a calculation unit configured to calculate the shift information of the alignment operation.
在本实施例的一些可选实现方式中,读写单元,包括:发送子单元,被配置成向存储器发送数据读取请求,数据读取请求包括第二地址;获取子单元,被配置成获取存储器响应于接收到数据读取请求返回的第一数据;移位子单元,被配置成根据移位信息对第一数据进行移位操作得到操作数;输出子单元,被配置成输出操作数。In some optional implementations of this embodiment, the read/write unit includes: a sending subunit, configured to send a data read request to the memory, where the data read request includes a second address; an obtaining subunit, configured to obtain The memory responds to receiving the first data returned by the data read request; the shift subunit is configured to perform a shift operation on the first data according to the shift information to obtain an operand; and the output subunit is configured to output the operand.
在本实施例的一些可选实现方式中,移位子单元,进一步配置用于:根据移位信息对第一数据进行移位操作得到第二数据;根据预先定义的操作数的数据长度截取第二数据得到操作数。In some optional implementations of this embodiment, the shift subunit is further configured to: perform a shift operation on the first data according to the shift information to obtain the second data; intercept the first data according to the data length of the pre-defined operand Two data get operand.
在本实施例的一些可选实现方式中,读写单元,包括:确定子单元,被配置成根据第一地址与预先定义的操作数的数据长度,确定第二地址所指示的待写入数据的有效位;写入子单元,被配置成将待写入数据中有效位上的数据写入存储器。In some optional implementations of this embodiment, the read/write unit includes: a determination subunit, configured to determine the to-be-written data indicated by the second address according to the first address and the data length of the pre-defined operand The valid bit of ; the writing subunit is configured to write the data on the valid bit in the data to be written into the memory.
本申请的上述实施例提供的装置,通过获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数,提高了存储器的存储利用率。In the device provided by the above-mentioned embodiments of the present application, the to-be-executed instruction is obtained by acquiring the to-be-executed instruction, which includes a data identifier; the to-be-executed instruction is decoded to obtain the first address in the memory of the operand indicated by the data identifier; The data bit width of the memory is used to align the first address to obtain the second address; the operand is read/written in the memory according to the second address, which improves the storage utilization of the memory.
本申请实施例还提供了一种电子设备。该电子设备的结构可以参考图7,其示出了本申请的电子设备的一个实施例的计算机系统700的结构示意图。图7示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。The embodiments of the present application also provide an electronic device. For the structure of the electronic device, reference may be made to FIG. 7 , which shows a schematic structural diagram of a
如图7所示,计算机系统700包括一个或多个中央处理单元(CPU)701以及一个或多个人工智能芯片704。CPU 701可以根据存储在只读存储器(ROM)702中的程序或者从存储部分707加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。人工智能芯片704包括一个或多个通用执行部件以及一个或多个专用执行部件,人工智能芯片704可以根据从CPU 701接收的程序而执行各种适当的动作和处理。在RAM 703中,还存储有系统700操作所需的各种程序和数据。CPU 701、ROM 702、RAM 703以及人工智能芯片704通过总线705彼此相连。输入/输出(I/O)接口706也连接至总线705。As shown in FIG. 7 , the
以下部件连接至I/O接口706:包括硬盘等的存储部分707;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分708。通信部分708经由诸如因特网的网络执行通信处理。驱动器709也根据需要连接至I/O接口706。可拆卸介质710,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器709上,以便于从其上读出的计算机程序根据需要被安装入存储部分707。The following components are connected to the I/O interface 706: a
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分708从网络上被下载和安装,和/或从可拆卸介质710被安装。在该计算机程序被人工智能芯片704的通用执行部件执行时,执行本申请的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the
需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional procedural programming language - such as "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在AI芯片中,例如,可以描述为:一种AI芯片包括获取单元、译码单元、对齐单元和读写单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“被配置成获取待执行指令的单元”。The units involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner. The described unit can also be provided in the AI chip, for example, it can be described as: an AI chip includes an acquisition unit, a decoding unit, an alignment unit and a read/write unit. Wherein, the names of these units do not constitute a limitation on the unit itself in some cases, for example, the obtaining unit may also be described as "a unit configured to obtain the instruction to be executed".
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的AI芯片中所包含的;也可以是单独存在,而未装配入该AI芯片中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该AI芯片执行时,使得该AI芯片:获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数。As another aspect, the present application also provides a computer-readable medium. The computer-readable medium may be included in the AI chip described in the above embodiments; it may also exist alone without being assembled into the AI chip. middle. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the AI chip, the AI chip: obtains the instruction to be executed, and the instruction to be executed includes a data identifier; decodes the instruction to be executed , obtain the first address in the memory of the operand indicated by the data identifier; perform an alignment operation on the first address according to the preset data bit width of the memory to obtain the second address; read/write operation in the memory according to the second address number.
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in this application (but not limited to) with similar functions.
Claims (13)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810910200.9A CN110825435B (en) | 2018-08-10 | 2018-08-10 | Method and apparatus for processing data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810910200.9A CN110825435B (en) | 2018-08-10 | 2018-08-10 | Method and apparatus for processing data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN110825435A true CN110825435A (en) | 2020-02-21 |
| CN110825435B CN110825435B (en) | 2023-01-24 |
Family
ID=69541343
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810910200.9A Active CN110825435B (en) | 2018-08-10 | 2018-08-10 | Method and apparatus for processing data |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN110825435B (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111782580A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Complex computing devices, methods, artificial intelligence chips and electronic devices |
| CN114398011A (en) * | 2022-01-17 | 2022-04-26 | 安谋科技(中国)有限公司 | Data storage method, apparatus and medium |
| CN116049069A (en) * | 2021-12-31 | 2023-05-02 | 海光信息技术股份有限公司 | Data reading method and related device |
| WO2023142524A1 (en) * | 2022-01-30 | 2023-08-03 | 上海商汤智能科技有限公司 | Instruction processing method and apparatus, chip, electronic device, and storage medium |
| CN116597886A (en) * | 2023-07-18 | 2023-08-15 | 深圳中安辰鸿技术有限公司 | Method for verifying LSU in NPU and related equipment |
| WO2025190220A1 (en) * | 2024-03-14 | 2025-09-18 | 上海壁仞科技股份有限公司 | Memory data loading/storage method and apparatus, and electronic device and storage medium |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4396982A (en) * | 1979-11-19 | 1983-08-02 | Hitachi, Ltd. | Microinstruction controlled data processing system including microinstructions with data align control feature |
| US20070106883A1 (en) * | 2005-11-07 | 2007-05-10 | Choquette Jack H | Efficient Streaming of Un-Aligned Load/Store Instructions that Save Unused Non-Aligned Data in a Scratch Register for the Next Instruction |
| CN101876892A (en) * | 2010-05-20 | 2010-11-03 | 复旦大学 | Single Instruction Multiple Data Processor Circuit Architecture for Communication and Multimedia Applications |
| CN103761075A (en) * | 2014-02-10 | 2014-04-30 | 东南大学 | Coarse granularity dynamic reconfigurable data integration and control unit structure |
| CN103984530A (en) * | 2014-05-15 | 2014-08-13 | 中国航天科技集团公司第九研究院第七七一研究所 | Assembly line structure and method for improving execution efficiency of store command |
| CN104407880A (en) * | 2014-10-27 | 2015-03-11 | 杭州中天微系统有限公司 | RISC (reduced instruction-set computer) processor loading/storage unit supporting non-aligned hardware storage accessing |
| CN108228235A (en) * | 2016-12-21 | 2018-06-29 | 龙芯中科技术有限公司 | Data manipulation treating method and apparatus based on MIPS frameworks |
| CN108334276A (en) * | 2017-01-20 | 2018-07-27 | 宇瞻科技股份有限公司 | dynamic data alignment method of flash memory |
-
2018
- 2018-08-10 CN CN201810910200.9A patent/CN110825435B/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4396982A (en) * | 1979-11-19 | 1983-08-02 | Hitachi, Ltd. | Microinstruction controlled data processing system including microinstructions with data align control feature |
| US20070106883A1 (en) * | 2005-11-07 | 2007-05-10 | Choquette Jack H | Efficient Streaming of Un-Aligned Load/Store Instructions that Save Unused Non-Aligned Data in a Scratch Register for the Next Instruction |
| CN101876892A (en) * | 2010-05-20 | 2010-11-03 | 复旦大学 | Single Instruction Multiple Data Processor Circuit Architecture for Communication and Multimedia Applications |
| CN103761075A (en) * | 2014-02-10 | 2014-04-30 | 东南大学 | Coarse granularity dynamic reconfigurable data integration and control unit structure |
| CN103984530A (en) * | 2014-05-15 | 2014-08-13 | 中国航天科技集团公司第九研究院第七七一研究所 | Assembly line structure and method for improving execution efficiency of store command |
| CN104407880A (en) * | 2014-10-27 | 2015-03-11 | 杭州中天微系统有限公司 | RISC (reduced instruction-set computer) processor loading/storage unit supporting non-aligned hardware storage accessing |
| CN108228235A (en) * | 2016-12-21 | 2018-06-29 | 龙芯中科技术有限公司 | Data manipulation treating method and apparatus based on MIPS frameworks |
| CN108334276A (en) * | 2017-01-20 | 2018-07-27 | 宇瞻科技股份有限公司 | dynamic data alignment method of flash memory |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111782580A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Complex computing devices, methods, artificial intelligence chips and electronic devices |
| EP3933586A1 (en) * | 2020-06-30 | 2022-01-05 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Complex computing device, complex computing method, artificial intelligence chip and electronic apparatus |
| US11782722B2 (en) | 2020-06-30 | 2023-10-10 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Input and output interfaces for transmitting complex computing information between AI processors and computing components of a special function unit |
| CN111782580B (en) * | 2020-06-30 | 2024-03-01 | 北京百度网讯科技有限公司 | Complex computing devices, methods, artificial intelligence chips and electronic equipment |
| CN116049069A (en) * | 2021-12-31 | 2023-05-02 | 海光信息技术股份有限公司 | Data reading method and related device |
| CN114398011A (en) * | 2022-01-17 | 2022-04-26 | 安谋科技(中国)有限公司 | Data storage method, apparatus and medium |
| CN114398011B (en) * | 2022-01-17 | 2023-09-22 | 安谋科技(中国)有限公司 | Data storage method, device and medium |
| WO2023142524A1 (en) * | 2022-01-30 | 2023-08-03 | 上海商汤智能科技有限公司 | Instruction processing method and apparatus, chip, electronic device, and storage medium |
| CN116597886A (en) * | 2023-07-18 | 2023-08-15 | 深圳中安辰鸿技术有限公司 | Method for verifying LSU in NPU and related equipment |
| CN116597886B (en) * | 2023-07-18 | 2023-10-24 | 深圳中安辰鸿技术有限公司 | Method for verifying LSU in NPU and related equipment |
| WO2025190220A1 (en) * | 2024-03-14 | 2025-09-18 | 上海壁仞科技股份有限公司 | Memory data loading/storage method and apparatus, and electronic device and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110825435B (en) | 2023-01-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110825435B (en) | Method and apparatus for processing data | |
| US11640300B2 (en) | Byte comparison method for string processing and instruction processing apparatus | |
| KR102371844B1 (en) | Computing method applied to artificial intelligence chip, and artificial intelligence chip | |
| JP5824488B2 (en) | Using completer knowledge about memory region ordering requests to modify transaction attributes | |
| US11650754B2 (en) | Data accessing method, device, and storage medium | |
| KR102787374B1 (en) | Accelerator, method for operating the same and device including the same | |
| CN113254073B (en) | Data processing method and device | |
| CN115952758A (en) | Chip verification method and device, electronic equipment and storage medium | |
| CN111552652A (en) | Data processing method, device and storage medium based on artificial intelligence chip | |
| US11392406B1 (en) | Alternative interrupt reporting channels for microcontroller access devices | |
| US12111779B2 (en) | Node identification allocation in a multi-tile system with multiple derivatives | |
| CN110825438A (en) | Method and apparatus for simulating data processing of artificial intelligence chips | |
| US9003364B2 (en) | Overriding system attributes and function returns in a software subsystem | |
| CN119829156A (en) | Method and system for acceleration or offloading with unified data pointers | |
| CN118331904A (en) | Data processing method, device, electronic device and readable storage medium | |
| US11907144B1 (en) | Early semaphore update | |
| CN106294143B (en) | Debugging method and device for register of chip | |
| US8593472B1 (en) | System and method for accessing a frame buffer via a storage driver | |
| US20140244232A1 (en) | Simulation apparatus and simulation method | |
| CN115297169B (en) | Data processing method, device, electronic equipment and medium | |
| CN112257360B (en) | Debugging method, device, debugging system and storage medium for data waveform | |
| CN117971583B (en) | Method and system for testing storage particles, electronic equipment and storage medium | |
| CN114579189B (en) | Method, processor, and system for single-core and multi-core access to register data | |
| US20250370951A1 (en) | Configuring and debugging a die-to-die link using a sideband link | |
| US20250328484A1 (en) | Low-power frame transmission over a communication interconnect |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| TA01 | Transfer of patent application right |
Effective date of registration: 20210927 Address after: Baidu building, No. 10, Shangdi 10th Street, Haidian District, Beijing 100086 Applicant after: Kunlun core (Beijing) Technology Co.,Ltd. Address before: 2 / F, baidu building, 10 Shangdi 10th Street, Haidian District, Beijing 100085 Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. |
|
| TA01 | Transfer of patent application right | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CP03 | Change of name, title or address |
Address after: 100085 Beijing City Haidian District Shangdi Information Road No. 19 Building 1 Third Floor 321 Patentee after: Kunlun Xing (Beijing) Science and Technology Co., Ltd. Country or region after: China Address before: Baidu building, No. 10, Shangdi 10th Street, Haidian District, Beijing 100086 Patentee before: Kunlun core (Beijing) Technology Co.,Ltd. Country or region before: China |
|
| CP03 | Change of name, title or address |