CN110825435A - Method and apparatus for processing data - Google Patents

Method and apparatus for processing data Download PDF

Info

Publication number
CN110825435A
CN110825435A CN201810910200.9A CN201810910200A CN110825435A CN 110825435 A CN110825435 A CN 110825435A CN 201810910200 A CN201810910200 A CN 201810910200A CN 110825435 A CN110825435 A CN 110825435A
Authority
CN
China
Prior art keywords
data
address
memory
operand
executed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810910200.9A
Other languages
Chinese (zh)
Other versions
CN110825435B (en
Inventor
徐英男
杜学亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunlun Xing Beijing Science And Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810910200.9A priority Critical patent/CN110825435B/en
Publication of CN110825435A publication Critical patent/CN110825435A/en
Application granted granted Critical
Publication of CN110825435B publication Critical patent/CN110825435B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

本申请实施例公开了用于处理数据的方法和装置。该方法的一具体实施方式包括:获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数。该实施方式提高了存储器的存储利用率。

Figure 201810910200

The embodiments of the present application disclose methods and apparatuses for processing data. A specific implementation of the method includes: obtaining the instruction to be executed, the instruction to be executed includes a data identifier; decoding the instruction to be executed to obtain the first address in the memory of the operand indicated by the data identifier; The first address is aligned with the data bit width to obtain the second address; the operand is read/written in the memory according to the second address. This embodiment improves the storage utilization of the memory.

Figure 201810910200

Description

用于处理数据的方法和装置Method and apparatus for processing data

技术领域technical field

本申请实施例涉及计算机技术领域,尤其涉及用于处理数据的方法和装置。The embodiments of the present application relate to the field of computer technologies, and in particular, to methods and apparatuses for processing data.

背景技术Background technique

近年来,随着以深度学习为代表的模型算法的兴起和发展,神经网络模型已经广泛应用于各个领域,例如,语音识别、图像识别、自然语言处理等领域。神经网络模型中存在大量的计算密集型算子,例如,矩阵计算、卷积、池化、激活、标准化等等。由于这些运算非常耗时,传统CPU(Central Processing Unit,中央处理单元)的运算能力难以满足需求,从而使得异构运算成为主流。并且因此开发出了各种神经网络专用处理器,诸如,GPU(GraphicsProcessing Unit,图形处理器)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、ASIC(Application Specific Integrated Circuits,专用集成电路)等神经网络专用处理器。In recent years, with the rise and development of model algorithms represented by deep learning, neural network models have been widely used in various fields, such as speech recognition, image recognition, natural language processing and other fields. There are a large number of computationally intensive operators in neural network models, such as matrix computation, convolution, pooling, activation, normalization, and more. Since these operations are very time-consuming, the computing power of a traditional CPU (Central Processing Unit, central processing unit) cannot meet the requirements, so that heterogeneous computing becomes the mainstream. And as a result, various neural network dedicated processors have been developed, such as GPU (GraphicsProcessing Unit, graphics processor), FPGA (Field-Programmable Gate Array, Field Programmable Gate Array), ASIC (Application Specific Integrated Circuits, application specific integrated circuit) ) and other special-purpose processors for neural networks.

目前,在通用处理器或者是针对一些有计算密集型特点的例如深度学习领域的处理器中,访问存储器的地址对软件编程人员来说有很强的限制,例如,在总线数据位宽为64byte的设置下,访存地址需要64byte对齐。出现非对齐地址的情况下,硬件会自动忽略非对齐部分,返回错误数据,并向软件报告中断异常。为了保证存储对齐限制下访问数据的正确性,需要对不规则的数据结构进行无效数据的填充。At present, in general-purpose processors or processors for some computing-intensive features such as deep learning, the address of accessing memory has strong restrictions on software programmers. For example, when the bus data bit width is 64 bytes Under the setting of , the fetch address needs to be 64byte aligned. In the case of an unaligned address, the hardware will automatically ignore the unaligned part, return incorrect data, and report an interrupt exception to the software. In order to ensure the correctness of accessing data under the limitation of storage alignment, it is necessary to fill the irregular data structure with invalid data.

发明内容SUMMARY OF THE INVENTION

本申请实施例提出了用于处理数据的方法和装置。The embodiments of the present application propose methods and apparatuses for processing data.

第一方面,本申请实施例提供了一种用于处理数据的方法,该方法包括:获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数。In a first aspect, an embodiment of the present application provides a method for processing data, the method includes: obtaining an instruction to be executed, where the instruction to be executed includes a data identifier; decoding the instruction to be executed to obtain an operand indicated by the data identifier The first address in the memory; the first address is aligned according to the preset data bit width of the memory to obtain the second address; the operand is read/written in the memory according to the second address.

在一些实施例中,根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址之后,方法还包括:计算对齐操作的移位信息。In some embodiments, after an alignment operation is performed on the first address according to a preset data bit width of the memory to obtain the second address, the method further includes: calculating shift information of the alignment operation.

在一些实施例中,根据第二地址在存储器中读/写操作数,包括:向存储器发送数据读取请求,数据读取请求包括第二地址;获取存储器响应于接收到数据读取请求返回的第一数据;根据移位信息对第一数据进行移位操作得到操作数;输出操作数。In some embodiments, reading/writing the operand in the memory according to the second address includes: sending a data read request to the memory, the data read request including the second address; obtaining the data returned by the memory in response to receiving the data read request the first data; perform a shift operation on the first data according to the shift information to obtain an operand; and output the operand.

在一些实施例中,根据移位信息对第一数据进行移位操作得到操作数,包括:根据移位信息对第一数据进行移位操作得到第二数据;根据预先定义的操作数的数据长度截取第二数据得到操作数。In some embodiments, performing a shift operation on the first data according to the shift information to obtain the operand includes: performing a shift operation on the first data according to the shift information to obtain the second data; according to a predefined data length of the operand Intercept the second data to obtain the operand.

在一些实施例中,根据第二地址在存储器中读/写操作数,包括:根据第一地址与预先定义的操作数的数据长度,确定第二地址所指示的待写入数据的有效位;将待写入数据中有效位上的数据写入存储器。In some embodiments, reading/writing the operand in the memory according to the second address includes: determining the valid bits of the data to be written indicated by the second address according to the first address and the data length of the pre-defined operand; Writes the data on the valid bits of the data to be written into memory.

第二方面,本申请实施例提供了一种用于处理数据的装置,该装置包括:获取单元,被配置成获取待执行指令,待执行指令包括数据标识;译码单元,被配置成对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;对齐单元,被配置成根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;读写单元,被配置成根据第二地址在存储器中读/写操作数。In a second aspect, an embodiment of the present application provides an apparatus for processing data, the apparatus comprising: an acquisition unit configured to acquire an instruction to be executed, the instruction to be executed includes a data identifier; a decoding unit configured to be executed The instruction is decoded to obtain the first address in the memory of the operand indicated by the data identifier; the alignment unit is configured to perform an alignment operation on the first address according to the preset data bit width of the memory to obtain the second address; read A write unit configured to read/write the operand in the memory according to the second address.

在一些实施例中,装置还包括:计算单元,被配置成计算对齐操作的移位信息。In some embodiments, the apparatus further includes a calculation unit configured to calculate shift information for the alignment operation.

在一些实施例中,读写单元,包括:发送子单元,被配置成向存储器发送数据读取请求,数据读取请求包括第二地址;获取子单元,被配置成获取存储器响应于接收到数据读取请求返回的第一数据;移位子单元,被配置成根据移位信息对第一数据进行移位操作得到操作数;输出子单元,被配置成输出操作数。In some embodiments, the read/write unit includes: a sending subunit configured to send a data read request to the memory, the data read request including the second address; an obtaining subunit configured to obtain the memory in response to receiving the data The first data returned by the read request; the shift subunit is configured to perform a shift operation on the first data according to the shift information to obtain an operand; and the output subunit is configured to output the operand.

在一些实施例中,移位子单元,进一步配置用于:根据移位信息对第一数据进行移位操作得到第二数据;根据预先定义的操作数的数据长度截取第二数据得到操作数。In some embodiments, the shift subunit is further configured to: perform a shift operation on the first data according to the shift information to obtain the second data; and truncate the second data according to the data length of the pre-defined operand to obtain the operand.

在一些实施例中,读写单元,包括:确定子单元,被配置成根据第一地址与预先定义的操作数的数据长度,确定第二地址所指示的待写入数据的有效位;写入子单元,被配置成将待写入数据中有效位上的数据写入存储器。In some embodiments, the read-write unit includes: a determination sub-unit configured to determine the valid bits of the data to be written indicated by the second address according to the first address and the data length of the pre-defined operand; write The subunit is configured to write the data on the valid bits of the data to be written into the memory.

第三方面,本申请实施例提供了一种人工智能芯片,包括:一个或多个处理器核;存储装置,其上存储有一个或多个程序;当一个或多个程序被一个或多个处理器核执行时,使得一个或多个处理器核实现如第一方面上述的方法。In a third aspect, embodiments of the present application provide an artificial intelligence chip, including: one or more processor cores; a storage device on which one or more programs are stored; when one or more programs are stored by one or more When executed by the processor cores, one or more processor cores are caused to implement the method described above in the first aspect.

第四方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被人工智能芯片执行时实现如第一方面上述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable medium on which a computer program is stored, and when the program is executed by an artificial intelligence chip, the method as described in the first aspect is implemented.

第五方面,本申请实施例提供了一种电子设备,包括:处理器、存储装置和至少一个如第三方面上述的人工智能芯片。In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage device, and at least one artificial intelligence chip as described in the third aspect.

本申请实施例提供的用于处理数据的方法和装置,通过获取待执行指令,而后对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址,并根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址,最后根据第二地址在存储器中读/写操作数,提高了存储器的存储利用率。The method and device for processing data provided by the embodiments of the present application obtain the first address in the memory of the operand indicated by the data identifier by acquiring the instruction to be executed, and then decoding the instruction to be executed, and according to the preset The data bit width of the memory aligns the first address to obtain the second address, and finally reads/writes the operand in the memory according to the second address, thereby improving the storage utilization of the memory.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1是本申请的一个实施例可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application may be applied;

图2是根据本申请的用于处理数据的方法的一个实施例的流程图;Figure 2 is a flowchart of one embodiment of a method for processing data according to the present application;

图3A是根据本申请的用于处理数据的方法的一个实施例中写入待写入数据前存储器中起始地址为第二地址的存储空间的一个示意图;3A is a schematic diagram of a storage space whose starting address is the second address in the memory before writing the data to be written in an embodiment of the method for processing data according to the present application;

图3B是根据本申请的用于处理数据的方法的一个实施例中起始地址为第二地址的待写入数据的一个示意图;3B is a schematic diagram of data to be written whose starting address is the second address in an embodiment of the method for processing data according to the present application;

图3C是根据本申请的用于处理数据的方法的一个实施例中用于指示待写入数据中有效位的信号的一个示意图;3C is a schematic diagram of a signal for indicating a valid bit in the data to be written in an embodiment of the method for processing data according to the present application;

图3D是根据本申请的用于处理数据的方法的一个实施例中写入待写入数据后存储器中起始地址为第二地址的存储空间的一个示意图;3D is a schematic diagram of a storage space whose starting address is the second address in the memory after writing the data to be written in an embodiment of the method for processing data according to the present application;

图4是根据本申请的用于处理数据的方法的又一个实施例的流程图;Figure 4 is a flowchart of yet another embodiment of a method for processing data according to the present application;

图5A是根据本申请的用于处理数据的方法的一个实施例中存储器返回的起始地址为第二地址的存储空间中第一数据的一个示意图;5A is a schematic diagram of the first data in the storage space where the starting address returned by the memory is the second address in an embodiment of the method for processing data according to the present application;

图5B是根据本申请的用于处理数据的方法的一个实施例中进行移位以及截取操作得到的操作数的一个示意图;5B is a schematic diagram of operands obtained by shifting and truncating operations in an embodiment of the method for processing data according to the present application;

图6是根据本申请的用于处理数据的装置的一个实施例的结构示意图;6 is a schematic structural diagram of an embodiment of an apparatus for processing data according to the present application;

图7是根据本申请的电子设备的一个实施例的计算机系统的结构示意图。FIG. 7 is a schematic structural diagram of a computer system according to an embodiment of the electronic device of the present application.

具体实施方式Detailed ways

下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

图1示出了可以应用本申请的用于处理数据的方法或用于处理数据的装置的实施例的示例性系统架构100。FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of a method for processing data or an apparatus for processing data of the present application may be applied.

如图1所示,系统架构100可以包括CPU(Central Processing Unit,中央处理器)101,总线102和AI(Artificial Intelligence)芯片103。总线102用以在CPU 101和AI芯片103之间提供通信链路的介质。总线102可以包括各种总线类型,例如PCIE(peripheralcomponent interconnect express)总线,AMBA(Advanced Microcontroller BusArchitecture)总线、OCP(Open Core Protocol)总线等等。As shown in FIG. 1 , the system architecture 100 may include a CPU (Central Processing Unit, central processing unit) 101 , a bus 102 and an AI (Artificial Intelligence) chip 103 . The bus 102 is used as a medium for providing a communication link between the CPU 101 and the AI chip 103 . The bus 102 may include various bus types, such as a PCIE (peripheral component interconnect express) bus, an AMBA (Advanced Microcontroller Bus Architecture) bus, an OCP (Open Core Protocol) bus, and the like.

AI芯片,即人工智能芯片,也被称为AI加速器或计算卡,是指专门用于处理人工智能应用中的大量计算任务的模块(其他非计算任务仍由CPU负责)。AI计算中运算的需求是巨大的,特别是复杂的运算需求对计算性能影响较大。复杂运算,虽然可以用基本运算指令来实现,但会降低复杂运算(如浮点开方运算、浮点求幂运算、三角函数运算等)执行效率。AI chip, that is, artificial intelligence chip, also known as AI accelerator or computing card, refers to a module dedicated to processing a large number of computing tasks in artificial intelligence applications (other non-computing tasks are still handled by the CPU). The computing requirements in AI computing are huge, especially complex computing requirements have a greater impact on computing performance. Although complex operations can be implemented with basic operation instructions, the execution efficiency of complex operations (such as floating-point square root operations, floating-point exponentiation operations, trigonometric function operations, etc.) will be reduced.

AI芯片103可以包括处理器核1031、1032、1033,总线1034和存储器1035。总线1034用于在处理器核1031、1032、1033和存储器1035之间提供通信链路的介质。总线1034可以包括各种总线类型,例如PCI总线、PCIE总线、支持片上互连(Network On Chip)协议的AMBA总线、OCP总线以及其他片上互连总线等等。The AI chip 103 may include processor cores 1031 , 1032 , 1033 , a bus 1034 and a memory 1035 . The bus 1034 is used to provide the medium of communication links between the processor cores 1031 , 1032 , 1033 and the memory 1035 . The bus 1034 may include various bus types, such as a PCI bus, a PCIE bus, an AMBA bus supporting the Network On Chip protocol, an OCP bus, and other on-chip interconnect buses, and the like.

需要说明的是,本申请实施例所提供的用于处理数据的方法可以由AI芯片103执行,相应的用于处理数据的装置可以设置于AI芯片中。It should be noted that the method for processing data provided by the embodiments of the present application may be executed by the AI chip 103, and the corresponding apparatus for processing data may be provided in the AI chip.

应该理解,图1中的CPU、总线、AI芯片的数目仅仅是示意性的。根据实现需要,可以具有任意数目的CPU、总线和AI芯片。同理,AI芯片103中处理器核、总线和存储器的数目也仅仅是示意性的。根据实现需要,AI芯片103中可以具有任意数目的处理器核、总线和存储器。另外,根据实现需要,系统架构100中还可以包括输入设备(比如鼠标、键盘等)、输出设备(比如显示器、扬声器等)、输入/输出接口等等。It should be understood that the numbers of CPUs, buses, and AI chips in FIG. 1 are only illustrative. There can be any number of CPUs, buses, and AI chips according to implementation needs. Similarly, the numbers of processor cores, buses and memories in the AI chip 103 are only illustrative. According to implementation requirements, the AI chip 103 may have any number of processor cores, buses and memories. In addition, according to implementation requirements, the system architecture 100 may further include input devices (eg, mouse, keyboard, etc.), output devices (eg, display, speakers, etc.), input/output interfaces, and the like.

继续参考图2,示出了根据本申请的用于处理数据的方法的一个实施例的流程200。该用于处理数据的方法,包括以下步骤:With continued reference to Figure 2, a flow 200 of one embodiment of a method for processing data according to the present application is shown. The method for processing data includes the following steps:

步骤201,获取待执行指令。Step 201, acquiring the instruction to be executed.

在本实施例中,用于处理数据的方法执行主体(例如图1所示的AI芯片)可以首先获取待执行指令,待执行指令包括数据标识。人工智能芯片可以与CPU通信连接,作为示例,人工智能芯片可以通过PCIE与CPU通信连接,以从CPU读取待执行指令。待执行指令包括数据标识,数据标识可以是数据名等可以标识操作数的信息。操作数可以是待对其进行处理的数据,例如待读取或待写入的数据。In this embodiment, the method execution body for processing data (for example, the AI chip shown in FIG. 1 ) may first acquire the to-be-executed instruction, and the to-be-executed instruction includes a data identifier. The artificial intelligence chip may be communicatively connected with the CPU. As an example, the artificial intelligence chip may be communicatively connected with the CPU through PCIE to read instructions to be executed from the CPU. The instruction to be executed includes a data identifier, and the data identifier may be information that can identify an operand, such as a data name. An operand can be the data to be processed, such as data to be read or written.

步骤202,对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址。Step 202: Decode the instruction to be executed to obtain the first address in the memory of the operand indicated by the data identifier.

在本实施例中,上述执行主体可以对步骤201中获取的待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址,第一地址可以是定义操作数时为其分配的地址。第一地址可以包括操作数在存储器中的首地址,也可以包括操作数在存储器中地址段,第一地址为操作数在存储器中的首地址时,第一地址指示的存储空间可以包括以第一地址为首地址的存储空间,存储空间的大小可以是预先设置的,也可以根据操作数长度决定。In this embodiment, the above-mentioned execution body may decode the to-be-executed instruction obtained in step 201 to obtain the first address in the memory of the operand indicated by the data identifier, and the first address may be the operand when the operand is defined. assigned address. The first address may include the first address of the operand in the memory, or may include the address segment of the operand in the memory. When the first address is the first address of the operand in the memory, the storage space indicated by the first address may include the first address. An address is the storage space of the first address, and the size of the storage space can be preset or determined according to the length of the operand.

这里,人工智能芯片的存储器可以包括以下至少一项:静态随机存取存储器(SRAM,Static Random-Access Memory)、动态随机存取存储器(DRAM,Dynamic RandomAccess Memory)和Flash存储器(Flash Memory)。Here, the memory of the artificial intelligence chip may include at least one of the following: static random access memory (SRAM, Static Random-Access Memory), dynamic random access memory (DRAM, Dynamic RandomAccess Memory), and Flash memory (Flash Memory).

根据存储层次,存储器可以分为:寄存器、局部存储(Local Memory,LM),共享存储(Share Memory,SM),全局存储(Global Memory,GM),作为示例,数据标识所指示的操作数可以存储于全局存储(Global Memory,GM)。According to the storage level, the memory can be divided into: register, local memory (LM), shared memory (Share Memory, SM), global memory (Global Memory, GM), as an example, the operand indicated by the data identifier can be stored in Global Memory (GM).

这里,上述执行主体可以在获取到待执行指令的情况下,可以在至少一个处理器核中选取执行待执行指令的处理器核作为目标处理器核。例如,可以根据每个处理器核的当前工作状态在至少一个处理器核中选取执行待执行指令的处理器核作为目标处理器核。又例如,也可以按照轮询的方式在至少一个处理器核中选取执行待执行指令的处理器核作为目标处理器核。Here, the above-mentioned execution body may select, from at least one processor core, a processor core that executes the to-be-executed instruction as a target processor core when the instruction to be executed is acquired. For example, the processor core that executes the instruction to be executed may be selected from at least one processor core as the target processor core according to the current working state of each processor core. For another example, the processor core that executes the instruction to be executed may be selected from at least one processor core as the target processor core in a polling manner.

这样,目标处理器核可以在获取到待执行指令的情况下,对待执行指令进行译码,得到操作码和地址。译码指的是按预定的指令格式对获取的待执行指令进行拆分和解译。操作码表示要执行的操作性质,即执行什么操作,或做什么。地址是操作码执行时的操作数的地址。计算机执行一条指定的指令时,必须首先分析这条指令的操作码是什么,以决定操作的性质和方法,然后才能控制计算机其他各部件协同完成指令表达的功能。In this way, the target processor core can decode the to-be-executed instruction to obtain the operation code and the address when the to-be-executed instruction is obtained. Decoding refers to splitting and interpreting the fetched instruction to be executed according to a predetermined instruction format. The opcode indicates the nature of the operation to be performed, that is, what to do, or what to do. The address is the address of the operand when the opcode is executed. When a computer executes a specified instruction, it must first analyze the operation code of the instruction to determine the nature and method of the operation, and then control the other components of the computer to cooperate to complete the function expressed by the instruction.

步骤203,根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址。Step 203: Perform an alignment operation on the first address according to the preset data bit width of the memory to obtain the second address.

在本实施例中,上述执行主体可以根据预先设置的存储器的数据位宽对步骤202中得到的第一地址进行对齐操作,得到第二地址。第二地址也可以包括操作数在存储器中的首地址,也可以包括操作数在存储器中地址段。In this embodiment, the above-mentioned execution body may perform an alignment operation on the first address obtained in step 202 according to the preset data bit width of the memory to obtain the second address. The second address may also include the first address of the operand in the memory, and may also include the address segment of the operand in the memory.

数据位宽可以是存储器一次能传输的数据量,即一次能传递的数据宽度。作为示例,AI芯片中全局存储可以采用高带宽显存(High Bandwidth Memory,HBM)技术,访存接口可以使用标准AXI(Advanced eXtensible Interface)总线协议,由于HBM本身接口的特性和AXI总线协议的限制,访问内存的地址对软件编程人员来说有很强的限制,例如:在AXI总线数据位宽为64byte的设置下,访存地址需要64byte对齐。虽然AXI总线协议支持传输的粒度是1byte,但是在同样的时钟频率条件下,数据位宽和访存带宽成正比,为了提高AI芯片的访存带宽,避免访存瓶颈,AXI总线位宽设置为64byte,从而引发访存地址的对齐限制。The data bit width can be the amount of data that the memory can transmit at one time, that is, the data width that can be transmitted at one time. As an example, the global storage in the AI chip can use High Bandwidth Memory (HBM) technology, and the memory access interface can use the standard AXI (Advanced eXtensible Interface) bus protocol. Due to the characteristics of the HBM itself and the limitations of the AXI bus protocol, The address of accessing the memory has strong restrictions for software programmers. For example, when the data bit width of the AXI bus is 64 bytes, the address of accessing memory needs to be aligned with 64 bytes. Although the transmission granularity supported by the AXI bus protocol is 1 byte, under the same clock frequency, the data bit width is proportional to the memory access bandwidth. In order to improve the memory access bandwidth of the AI chip and avoid memory access bottlenecks, the AXI bus bit width is set to 64byte, thus causing the alignment limit of the fetch address.

在这里,对齐操作可以是向下对齐,根据预先设置的存储器的数据位宽对第一地址进行对齐操作,可以是将第一地址对齐到在它之前的一个数据位宽的起始位置,例如可以将第一地址0x21000004对齐变成第二地址0x21000000。Here, the alignment operation may be downward alignment, and the alignment operation is performed on the first address according to the preset data bit width of the memory, which may be to align the first address to the starting position of a data bit width before it, for example The first address 0x21000004 can be aligned to become the second address 0x21000000.

步骤204,根据第二地址在存储器中读/写操作数。Step 204, read/write the operand in the memory according to the second address.

在本实施例中,上述执行主体可以根据步骤203中得到的第二地址在存储器中读/写操作数。执行主体可以读取第二地址指示的存储空间中的数据,或将操作数写入第二地址指示的存储空间。In this embodiment, the above-mentioned execution body may read/write the operand in the memory according to the second address obtained in step 203 . The execution body can read the data in the storage space indicated by the second address, or write the operand into the storage space indicated by the second address.

在本实施例的一些可选实现方式中,根据第二地址在存储器中读/写操作数,包括:根据第一地址与预先定义的操作数的数据长度,确定第二地址所指示的待写入数据的有效位;将待写入数据中有效位上的数据写入存储器。作为示例,在AXI数据通道传输可以通过wstrb信号实现将待写入数据中有效位上的数据写入存储器,wrtrb信号用于指示数据有效位是哪些。In some optional implementations of this embodiment, reading/writing the operand in the memory according to the second address includes: determining the to-be-written indicated by the second address according to the first address and the data length of the pre-defined operand Enter the valid bit of the data; write the data on the valid bit of the data to be written into the memory. As an example, in the AXI data channel transmission, the data on the valid bits in the data to be written may be written into the memory through the wstrb signal, and the wrtrb signal is used to indicate which are the valid bits of the data.

在本实现方式中,操作数的数据长度可以是在定义操作数时确定的,作为示例,操作数是数组A,定义的数据类型是浮点32bit,数组长度8,数组A的数据长度为32byte。In this implementation manner, the data length of the operand may be determined when the operand is defined. As an example, the operand is an array A, the defined data type is floating point 32bit, the array length is 8, and the data length of the array A is 32bytes .

如图3A所示,数组A写入存储器之前,存储器中起始地址为第二地址的存储空间中存储有数组C与数组B。如图3B所示,起始地址为第二地址的待写入数据中操作数即数组A的起始地址实际为第一地址,根据第一地址与预先定义的操作数的数据长度,可以确定第二地址所指示的待写入数据的有效位,用于指示有效位的信号可以如图3C所示。而后可以将待写入数据中有效位上的数据写入存储器中,存储器起始地址为第二地址的存储空间中写入操作数数组A后的结果如图3D所示。As shown in FIG. 3A , before the array A is written into the memory, the array C and the array B are stored in the storage space whose starting address is the second address in the memory. As shown in FIG. 3B , the operand in the data to be written whose starting address is the second address, that is, the starting address of the array A is actually the first address. According to the first address and the data length of the pre-defined operand, it can be determined The valid bit of the data to be written indicated by the second address, the signal used to indicate the valid bit may be as shown in FIG. 3C . Then, the data on the valid bits in the data to be written can be written into the memory, and the result after writing the operand array A in the storage space where the starting address of the memory is the second address is shown in FIG. 3D .

本申请的上述实施例提供的方法通过获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数,避免了无效数据的填充,提高了存储器的存储利用率。The method provided by the above-mentioned embodiments of the present application obtains the to-be-executed instruction, which includes a data identifier; decodes the to-be-executed instruction to obtain the first address in the memory of the operand indicated by the data identifier; according to the preset memory Aligning the first address to obtain the second address; reading/writing operands in the memory according to the second address, avoiding filling of invalid data and improving the storage utilization of the memory.

进一步参考图4,其示出了用于处理数据的方法的又一个实施例的流程400。该用于处理数据的方法的流程400,包括以下步骤:With further reference to Figure 4, a flow 400 of yet another embodiment of a method for processing data is shown. The flow 400 of the method for processing data includes the following steps:

步骤401,获取待执行指令。In step 401, the instruction to be executed is obtained.

在本实施例中,用于处理数据的方法执行主体(例如图1所示的AI芯片)可以首先获取待执行指令,待执行指令包括数据标识。In this embodiment, the method execution body for processing data (for example, the AI chip shown in FIG. 1 ) may first acquire the to-be-executed instruction, and the to-be-executed instruction includes a data identifier.

步骤402,对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址。Step 402: Decode the instruction to be executed to obtain the first address in the memory of the operand indicated by the data identifier.

在本实施例中,上述执行主体可以对步骤401中获取的待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址。In this embodiment, the above-mentioned execution body may decode the to-be-executed instruction obtained in step 401 to obtain the first address in the memory of the operand indicated by the data identifier.

步骤403,根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址。Step 403: Perform an alignment operation on the first address according to the preset data bit width of the memory to obtain the second address.

在本实施例中,上述执行主体可以根据预先设置的存储器的数据位宽对步骤402中得到的第一地址进行对齐操作,得到第二地址。In this embodiment, the above-mentioned execution body may perform an alignment operation on the first address obtained in step 402 according to the preset data bit width of the memory to obtain the second address.

步骤404,计算对齐操作的移位信息。Step 404, calculating the shift information of the alignment operation.

在本实施例中,上述执行主体可以计算步骤403中对齐操作的移位信息。移位信息可以包括移位数和移位方向。作为示例,将地址0x21000004对齐变成0x21000000,则移位数可以为4。In this embodiment, the above-mentioned executive body may calculate the shift information of the alignment operation in step 403 . The shift information may include a shift number and a shift direction. As an example, aligning address 0x21000004 to 0x21000000, the shift number can be 4.

步骤405,向存储器发送数据读取请求,数据读取请求包括第二地址。Step 405: Send a data read request to the memory, where the data read request includes the second address.

在本实施例中,上述执行主体可以向存储器发送数据读取请求,数据读取请求包括步骤403中得到的第二地址。由于第二地址为对齐后的地址,存储器可以根据第二地址返回正确的数据。In this embodiment, the above-mentioned execution body may send a data read request to the memory, where the data read request includes the second address obtained in step 403 . Since the second address is an aligned address, the memory can return correct data according to the second address.

步骤406,获取存储器响应于接收到数据读取请求返回的第一数据。Step 406: Obtain the first data returned by the memory in response to receiving the data read request.

在本实施例中,上述执行主体可以获取存储器响应于接收到步骤405中发送的数据读取请求返回的第一数据。第一数据中包括所需的操作数。In this embodiment, the above-mentioned execution body may acquire the first data returned by the memory in response to receiving the data read request sent in step 405 . The required operands are included in the first data.

步骤407,根据移位信息对第一数据进行移位操作得到操作数。Step 407: Perform a shift operation on the first data according to the shift information to obtain an operand.

在本实施例中,上述执行主体可以根据步骤404中计算出的移位信息对第一数据进行移位操作得到操作数。由于第二地址是第一地址进行对齐操作后的地址,第二地址指示的数据移位后为第一地址指示的数据,即包括操作数的数据。In this embodiment, the above-mentioned execution body may perform a shift operation on the first data according to the shift information calculated in step 404 to obtain an operand. Since the second address is the address after the alignment operation of the first address is performed, the data indicated by the second address is the data indicated by the first address after being shifted, that is, the data including the operand.

在本实施例的一些可选实现方式中,根据移位信息对第一数据进行移位操作得到操作数,包括:根据移位信息对第一数据进行移位操作得到第二数据;根据预先定义的操作数的数据长度截取第二数据得到操作数。本实现方式通过预先定义的操作数的数据长度截取第二数据,得到操作数,进一步提高了输出的操作数的精准度,后续使用存储器输出数据的模块可以直接使用,不用进一步处理。In some optional implementation manners of this embodiment, performing a shift operation on the first data according to the shift information to obtain the operand includes: performing a shift operation on the first data according to the shift information to obtain the second data; The data length of the operand intercepts the second data to obtain the operand. This implementation method intercepts the second data by the data length of the pre-defined operand to obtain the operand, which further improves the accuracy of the output operand, and the module that uses the memory to output the data can be used directly without further processing.

作为示例,存储器返回的起始地址为第二地址的存储空间中的第一数据如图5A所示,针对第一数据,根据移位信息进行移位操作,并根据预先定义的操作数的数据长度进行截取,得到如图5B所示的操作数。As an example, the first data in the storage space where the starting address returned by the memory is the second address is shown in FIG. 5A . For the first data, the shift operation is performed according to the shift information, and the data of the pre-defined operand is performed. The length is truncated to obtain the operand as shown in FIG. 5B .

步骤408,输出操作数。Step 408, output the operand.

在本实施例中,上述执行主体可以输出步骤407中得到的操作数。In this embodiment, the above-mentioned execution body may output the operand obtained in step 407 .

在本实施例中,步骤401、步骤402、步骤403的操作与步骤201、步骤202、步骤203的操作基本相同,在此不再赘述。In this embodiment, the operations of step 401 , step 402 , and step 403 are basically the same as those of step 201 , step 202 , and step 203 , which will not be repeated here.

从图4中可以看出,与图2对应的实施例相比,本实施例中的用于处理数据的方法的流程400中通过计算对齐操作的移位信息,并根据移位信息对第一数据进行移位操作,由此,本实施例描述的方案中输出了更为精准的操作数,便于读取存储器数据的模块使用输出的操作数。It can be seen from FIG. 4 that, compared with the embodiment corresponding to FIG. 2 , in the process 400 of the method for processing data in this embodiment, the shift information of the alignment operation is calculated, and the first The data is subjected to a shift operation. Therefore, in the solution described in this embodiment, a more precise operand is output, which is convenient for the module that reads the memory data to use the output operand.

进一步参考图6,作为对上述各图所示方法的实现,本申请提供了一种用于处理数据的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 6 , as an implementation of the methods shown in the above figures, the present application provides an embodiment of an apparatus for processing data. The apparatus embodiment corresponds to the method embodiment shown in FIG. 2 . The device can be specifically applied to various electronic devices.

如图6所示,本实施例的用于处理数据的装置600包括:获取单元601、译码单元602、对齐单元603、读写单元604。其中,获取单元,被配置成获取待执行指令,待执行指令包括数据标识;译码单元,被配置成对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;对齐单元,被配置成根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;读写单元,被配置成根据第二地址在存储器中读/写操作数。As shown in FIG. 6 , the apparatus 600 for processing data in this embodiment includes: an acquisition unit 601 , a decoding unit 602 , an alignment unit 603 , and a read-write unit 604 . Wherein, the obtaining unit is configured to obtain the instruction to be executed, and the instruction to be executed includes a data identifier; the decoding unit is configured to decode the instruction to be executed to obtain the first address of the operand indicated by the data identifier in the memory; The alignment unit is configured to perform an alignment operation on the first address according to the preset data bit width of the memory to obtain the second address; the read-write unit is configured to read/write the operand in the memory according to the second address.

在本实施例中,用于处理数据的装置600的获取单元601、译码单元602、对齐单元603、读写单元604的具体处理可以参考图2对应实施例中的步骤201、步骤202、步骤203和步骤204。In this embodiment, for the specific processing of the acquiring unit 601 , the decoding unit 602 , the aligning unit 603 , and the reading and writing unit 604 of the apparatus 600 for processing data, reference may be made to steps 201 , 202 and 202 in the corresponding embodiment of FIG. 2 203 and step 204.

在本实施例的一些可选实现方式中,装置还包括:计算单元,被配置成计算对齐操作的移位信息。In some optional implementations of this embodiment, the apparatus further includes: a calculation unit configured to calculate the shift information of the alignment operation.

在本实施例的一些可选实现方式中,读写单元,包括:发送子单元,被配置成向存储器发送数据读取请求,数据读取请求包括第二地址;获取子单元,被配置成获取存储器响应于接收到数据读取请求返回的第一数据;移位子单元,被配置成根据移位信息对第一数据进行移位操作得到操作数;输出子单元,被配置成输出操作数。In some optional implementations of this embodiment, the read/write unit includes: a sending subunit, configured to send a data read request to the memory, where the data read request includes a second address; an obtaining subunit, configured to obtain The memory responds to receiving the first data returned by the data read request; the shift subunit is configured to perform a shift operation on the first data according to the shift information to obtain an operand; and the output subunit is configured to output the operand.

在本实施例的一些可选实现方式中,移位子单元,进一步配置用于:根据移位信息对第一数据进行移位操作得到第二数据;根据预先定义的操作数的数据长度截取第二数据得到操作数。In some optional implementations of this embodiment, the shift subunit is further configured to: perform a shift operation on the first data according to the shift information to obtain the second data; intercept the first data according to the data length of the pre-defined operand Two data get operand.

在本实施例的一些可选实现方式中,读写单元,包括:确定子单元,被配置成根据第一地址与预先定义的操作数的数据长度,确定第二地址所指示的待写入数据的有效位;写入子单元,被配置成将待写入数据中有效位上的数据写入存储器。In some optional implementations of this embodiment, the read/write unit includes: a determination subunit, configured to determine the to-be-written data indicated by the second address according to the first address and the data length of the pre-defined operand The valid bit of ; the writing subunit is configured to write the data on the valid bit in the data to be written into the memory.

本申请的上述实施例提供的装置,通过获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数,提高了存储器的存储利用率。In the device provided by the above-mentioned embodiments of the present application, the to-be-executed instruction is obtained by acquiring the to-be-executed instruction, which includes a data identifier; the to-be-executed instruction is decoded to obtain the first address in the memory of the operand indicated by the data identifier; The data bit width of the memory is used to align the first address to obtain the second address; the operand is read/written in the memory according to the second address, which improves the storage utilization of the memory.

本申请实施例还提供了一种电子设备。该电子设备的结构可以参考图7,其示出了本申请的电子设备的一个实施例的计算机系统700的结构示意图。图7示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。The embodiments of the present application also provide an electronic device. For the structure of the electronic device, reference may be made to FIG. 7 , which shows a schematic structural diagram of a computer system 700 of an embodiment of the electronic device of the present application. The electronic device shown in FIG. 7 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

如图7所示,计算机系统700包括一个或多个中央处理单元(CPU)701以及一个或多个人工智能芯片704。CPU 701可以根据存储在只读存储器(ROM)702中的程序或者从存储部分707加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。人工智能芯片704包括一个或多个通用执行部件以及一个或多个专用执行部件,人工智能芯片704可以根据从CPU 701接收的程序而执行各种适当的动作和处理。在RAM 703中,还存储有系统700操作所需的各种程序和数据。CPU 701、ROM 702、RAM 703以及人工智能芯片704通过总线705彼此相连。输入/输出(I/O)接口706也连接至总线705。As shown in FIG. 7 , the computer system 700 includes one or more central processing units (CPUs) 701 and one or more artificial intelligence chips 704 . The CPU 701 can execute various appropriate actions and processes according to a program stored in a read only memory (ROM) 702 or a program loaded from a storage section 707 into a random access memory (RAM) 703 . The artificial intelligence chip 704 includes one or more general-purpose execution units and one or more dedicated execution units, and the artificial intelligence chip 704 can execute various appropriate actions and processes according to the program received from the CPU 701 . In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701 , the ROM 702 , the RAM 703 , and the artificial intelligence chip 704 are connected to each other through a bus 705 . An input/output (I/O) interface 706 is also connected to bus 705 .

以下部件连接至I/O接口706:包括硬盘等的存储部分707;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分708。通信部分708经由诸如因特网的网络执行通信处理。驱动器709也根据需要连接至I/O接口706。可拆卸介质710,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器709上,以便于从其上读出的计算机程序根据需要被安装入存储部分707。The following components are connected to the I/O interface 706: a storage section 707 including a hard disk and the like; and a communication section 708 including a network interface card such as a LAN card, a modem, and the like. The communication section 708 performs communication processing via a network such as the Internet. A driver 709 is also connected to the I/O interface 706 as needed. A removable medium 710, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 709 as needed so that a computer program read therefrom is installed into the storage section 707 as needed.

特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分708从网络上被下载和安装,和/或从可拆卸介质710被安装。在该计算机程序被人工智能芯片704的通用执行部件执行时,执行本申请的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 708 and/or installed from the removable medium 710 . When the computer program is executed by the general execution component of the artificial intelligence chip 704, the above-mentioned functions defined in the method of the present application are executed.

需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional procedural programming language - such as "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在AI芯片中,例如,可以描述为:一种AI芯片包括获取单元、译码单元、对齐单元和读写单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“被配置成获取待执行指令的单元”。The units involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner. The described unit can also be provided in the AI chip, for example, it can be described as: an AI chip includes an acquisition unit, a decoding unit, an alignment unit and a read/write unit. Wherein, the names of these units do not constitute a limitation on the unit itself in some cases, for example, the obtaining unit may also be described as "a unit configured to obtain the instruction to be executed".

作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的AI芯片中所包含的;也可以是单独存在,而未装配入该AI芯片中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该AI芯片执行时,使得该AI芯片:获取待执行指令,待执行指令包括数据标识;对待执行指令进行译码,得到数据标识所指示的操作数在存储器中的第一地址;根据预先设置的存储器的数据位宽对第一地址进行对齐操作,得到第二地址;根据第二地址在存储器中读/写操作数。As another aspect, the present application also provides a computer-readable medium. The computer-readable medium may be included in the AI chip described in the above embodiments; it may also exist alone without being assembled into the AI chip. middle. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the AI chip, the AI chip: obtains the instruction to be executed, and the instruction to be executed includes a data identifier; decodes the instruction to be executed , obtain the first address in the memory of the operand indicated by the data identifier; perform an alignment operation on the first address according to the preset data bit width of the memory to obtain the second address; read/write operation in the memory according to the second address number.

以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in this application (but not limited to) with similar functions.

Claims (13)

1. A method for processing data, comprising:
acquiring an instruction to be executed, wherein the instruction to be executed comprises a data identifier;
decoding the instruction to be executed to obtain a first address of an operand indicated by the data identifier in a memory;
aligning the first address according to a preset data bit width of the memory to obtain a second address;
reading/writing the operand in the memory according to the second address.
2. The method according to claim 1, wherein after performing an alignment operation on the first address according to a preset data bit width of the memory to obtain a second address, the method further comprises:
shift information of the alignment operation is calculated.
3. The method of claim 2, wherein said reading/writing the operand in the memory according to the second address comprises:
sending a data read request to the memory, the data read request including the second address;
acquiring first data returned by the memory in response to the received data reading request;
performing a shift operation on the first data according to the shift information to obtain the operand;
and outputting the operand.
4. The method of claim 3, wherein the shifting the first data according to the shift information to obtain the operand comprises:
shifting the first data according to the shifting information to obtain second data;
and intercepting the second data according to the predefined data length of the operand to obtain the operand.
5. The method of any of claims 1-4, wherein the reading/writing the operand in the memory according to the second address comprises:
determining the effective bit of the data to be written, which is indicated by the second address, according to the first address and the predefined data length of the operand;
and writing the data on the effective bits in the data to be written into the memory.
6. An apparatus for processing data, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire an instruction to be executed, and the instruction to be executed comprises a data identifier;
the decoding unit is configured to decode the instruction to be executed to obtain a first address of an operand indicated by the data identifier in a memory;
the alignment unit is configured to perform alignment operation on the first address according to a preset data bit width of the memory to obtain a second address;
a read/write unit configured to read/write the operand in the memory according to the second address.
7. The apparatus of claim 6, wherein the apparatus further comprises:
a calculation unit configured to calculate shift information of the alignment operation.
8. The apparatus of claim 7, wherein the read-write unit comprises:
a sending subunit configured to send a data read request to the memory, the data read request including the second address;
a fetch subunit configured to fetch first data returned by the memory in response to receiving the data read request;
a shift subunit configured to shift the first data according to the shift information to obtain the operand;
an output subunit configured to output the operand.
9. The apparatus of claim 8, wherein the shifting subunit is further configured to:
shifting the first data according to the shifting information to obtain second data;
and intercepting the second data according to the predefined data length of the operand to obtain the operand.
10. The apparatus according to any of claims 6-9, wherein the read-write unit comprises:
a determining subunit configured to determine, according to the first address and a predefined data length of the operand, valid bits of data to be written indicated by the second address;
a write subunit configured to write data on valid bits in the data to be written into the memory.
11. An artificial intelligence chip comprising:
one or more processor cores;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processor cores, cause the one or more processor cores to implement the method of any of claims 1-5.
12. A computer-readable medium, on which a computer program is stored which, when executed by an artificial intelligence chip, carries out the method according to any one of claims 1 to 5.
13. An electronic device, comprising: a processor, a memory device, and at least one artificial intelligence chip as recited in claim 11.
CN201810910200.9A 2018-08-10 2018-08-10 Method and apparatus for processing data Active CN110825435B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810910200.9A CN110825435B (en) 2018-08-10 2018-08-10 Method and apparatus for processing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810910200.9A CN110825435B (en) 2018-08-10 2018-08-10 Method and apparatus for processing data

Publications (2)

Publication Number Publication Date
CN110825435A true CN110825435A (en) 2020-02-21
CN110825435B CN110825435B (en) 2023-01-24

Family

ID=69541343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810910200.9A Active CN110825435B (en) 2018-08-10 2018-08-10 Method and apparatus for processing data

Country Status (1)

Country Link
CN (1) CN110825435B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782580A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Complex computing devices, methods, artificial intelligence chips and electronic devices
CN114398011A (en) * 2022-01-17 2022-04-26 安谋科技(中国)有限公司 Data storage method, apparatus and medium
CN116049069A (en) * 2021-12-31 2023-05-02 海光信息技术股份有限公司 Data reading method and related device
WO2023142524A1 (en) * 2022-01-30 2023-08-03 上海商汤智能科技有限公司 Instruction processing method and apparatus, chip, electronic device, and storage medium
CN116597886A (en) * 2023-07-18 2023-08-15 深圳中安辰鸿技术有限公司 Method for verifying LSU in NPU and related equipment
WO2025190220A1 (en) * 2024-03-14 2025-09-18 上海壁仞科技股份有限公司 Memory data loading/storage method and apparatus, and electronic device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4396982A (en) * 1979-11-19 1983-08-02 Hitachi, Ltd. Microinstruction controlled data processing system including microinstructions with data align control feature
US20070106883A1 (en) * 2005-11-07 2007-05-10 Choquette Jack H Efficient Streaming of Un-Aligned Load/Store Instructions that Save Unused Non-Aligned Data in a Scratch Register for the Next Instruction
CN101876892A (en) * 2010-05-20 2010-11-03 复旦大学 Single Instruction Multiple Data Processor Circuit Architecture for Communication and Multimedia Applications
CN103761075A (en) * 2014-02-10 2014-04-30 东南大学 Coarse granularity dynamic reconfigurable data integration and control unit structure
CN103984530A (en) * 2014-05-15 2014-08-13 中国航天科技集团公司第九研究院第七七一研究所 Assembly line structure and method for improving execution efficiency of store command
CN104407880A (en) * 2014-10-27 2015-03-11 杭州中天微系统有限公司 RISC (reduced instruction-set computer) processor loading/storage unit supporting non-aligned hardware storage accessing
CN108228235A (en) * 2016-12-21 2018-06-29 龙芯中科技术有限公司 Data manipulation treating method and apparatus based on MIPS frameworks
CN108334276A (en) * 2017-01-20 2018-07-27 宇瞻科技股份有限公司 dynamic data alignment method of flash memory

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4396982A (en) * 1979-11-19 1983-08-02 Hitachi, Ltd. Microinstruction controlled data processing system including microinstructions with data align control feature
US20070106883A1 (en) * 2005-11-07 2007-05-10 Choquette Jack H Efficient Streaming of Un-Aligned Load/Store Instructions that Save Unused Non-Aligned Data in a Scratch Register for the Next Instruction
CN101876892A (en) * 2010-05-20 2010-11-03 复旦大学 Single Instruction Multiple Data Processor Circuit Architecture for Communication and Multimedia Applications
CN103761075A (en) * 2014-02-10 2014-04-30 东南大学 Coarse granularity dynamic reconfigurable data integration and control unit structure
CN103984530A (en) * 2014-05-15 2014-08-13 中国航天科技集团公司第九研究院第七七一研究所 Assembly line structure and method for improving execution efficiency of store command
CN104407880A (en) * 2014-10-27 2015-03-11 杭州中天微系统有限公司 RISC (reduced instruction-set computer) processor loading/storage unit supporting non-aligned hardware storage accessing
CN108228235A (en) * 2016-12-21 2018-06-29 龙芯中科技术有限公司 Data manipulation treating method and apparatus based on MIPS frameworks
CN108334276A (en) * 2017-01-20 2018-07-27 宇瞻科技股份有限公司 dynamic data alignment method of flash memory

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782580A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Complex computing devices, methods, artificial intelligence chips and electronic devices
EP3933586A1 (en) * 2020-06-30 2022-01-05 Beijing Baidu Netcom Science And Technology Co., Ltd. Complex computing device, complex computing method, artificial intelligence chip and electronic apparatus
US11782722B2 (en) 2020-06-30 2023-10-10 Beijing Baidu Netcom Science And Technology Co., Ltd. Input and output interfaces for transmitting complex computing information between AI processors and computing components of a special function unit
CN111782580B (en) * 2020-06-30 2024-03-01 北京百度网讯科技有限公司 Complex computing devices, methods, artificial intelligence chips and electronic equipment
CN116049069A (en) * 2021-12-31 2023-05-02 海光信息技术股份有限公司 Data reading method and related device
CN114398011A (en) * 2022-01-17 2022-04-26 安谋科技(中国)有限公司 Data storage method, apparatus and medium
CN114398011B (en) * 2022-01-17 2023-09-22 安谋科技(中国)有限公司 Data storage method, device and medium
WO2023142524A1 (en) * 2022-01-30 2023-08-03 上海商汤智能科技有限公司 Instruction processing method and apparatus, chip, electronic device, and storage medium
CN116597886A (en) * 2023-07-18 2023-08-15 深圳中安辰鸿技术有限公司 Method for verifying LSU in NPU and related equipment
CN116597886B (en) * 2023-07-18 2023-10-24 深圳中安辰鸿技术有限公司 Method for verifying LSU in NPU and related equipment
WO2025190220A1 (en) * 2024-03-14 2025-09-18 上海壁仞科技股份有限公司 Memory data loading/storage method and apparatus, and electronic device and storage medium

Also Published As

Publication number Publication date
CN110825435B (en) 2023-01-24

Similar Documents

Publication Publication Date Title
CN110825435B (en) Method and apparatus for processing data
US11640300B2 (en) Byte comparison method for string processing and instruction processing apparatus
KR102371844B1 (en) Computing method applied to artificial intelligence chip, and artificial intelligence chip
JP5824488B2 (en) Using completer knowledge about memory region ordering requests to modify transaction attributes
US11650754B2 (en) Data accessing method, device, and storage medium
KR102787374B1 (en) Accelerator, method for operating the same and device including the same
CN113254073B (en) Data processing method and device
CN115952758A (en) Chip verification method and device, electronic equipment and storage medium
CN111552652A (en) Data processing method, device and storage medium based on artificial intelligence chip
US11392406B1 (en) Alternative interrupt reporting channels for microcontroller access devices
US12111779B2 (en) Node identification allocation in a multi-tile system with multiple derivatives
CN110825438A (en) Method and apparatus for simulating data processing of artificial intelligence chips
US9003364B2 (en) Overriding system attributes and function returns in a software subsystem
CN119829156A (en) Method and system for acceleration or offloading with unified data pointers
CN118331904A (en) Data processing method, device, electronic device and readable storage medium
US11907144B1 (en) Early semaphore update
CN106294143B (en) Debugging method and device for register of chip
US8593472B1 (en) System and method for accessing a frame buffer via a storage driver
US20140244232A1 (en) Simulation apparatus and simulation method
CN115297169B (en) Data processing method, device, electronic equipment and medium
CN112257360B (en) Debugging method, device, debugging system and storage medium for data waveform
CN117971583B (en) Method and system for testing storage particles, electronic equipment and storage medium
CN114579189B (en) Method, processor, and system for single-core and multi-core access to register data
US20250370951A1 (en) Configuring and debugging a die-to-die link using a sideband link
US20250328484A1 (en) Low-power frame transmission over a communication interconnect

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210927

Address after: Baidu building, No. 10, Shangdi 10th Street, Haidian District, Beijing 100086

Applicant after: Kunlun core (Beijing) Technology Co.,Ltd.

Address before: 2 / F, baidu building, 10 Shangdi 10th Street, Haidian District, Beijing 100085

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 100085 Beijing City Haidian District Shangdi Information Road No. 19 Building 1 Third Floor 321

Patentee after: Kunlun Xing (Beijing) Science and Technology Co., Ltd.

Country or region after: China

Address before: Baidu building, No. 10, Shangdi 10th Street, Haidian District, Beijing 100086

Patentee before: Kunlun core (Beijing) Technology Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address