WO2024187629A1

WO2024187629A1 - Method and apparatus for vector instruction table filling and lookup in processor, and electronic device

Info

Publication number: WO2024187629A1
Application number: PCT/CN2023/102852
Authority: WO
Inventors: 李祖松; 郇丹丹; 商家玮; 杨婷; 邱剑
Original assignee: Beijing Vcore Technology Co Ltd
Current assignee: Beijing Vcore Technology Co Ltd
Priority date: 2023-03-10
Filing date: 2023-06-27
Publication date: 2024-09-19
Anticipated expiration: 2025-09-10
Also published as: CN115951937A; CN115951937B; EP4679263A1

Abstract

The present disclosure provides a method and apparatus for vector instruction table filling and lookup in a processor, and an electronic device. The method comprises: configuring a vector instruction lookup table in a preset storage space of the processor, wherein the storage capacity of the vector instruction lookup table is the product of the number of items and the number of bits of storage items; determining table filling types corresponding to candidate vector registers waiting for table filling, wherein the table filling types comprise a first type and a second type, the first type is that the product of the register width and the maximum number of register groups is larger than or equal to the storage capacity, and the second type is that the product of the register width and the maximum number of register groups is smaller than the storage capacity; and using table filling rules corresponding to the table filling types to write first elements stored in target vector registers among the plurality of candidate vector registers into the vector instruction lookup table, and using a preset table lookup rule to acquire, from the vector instruction lookup table, a second element to be looked up, such that the performance of the processor can be improved.

Description

Method, device and electronic device for filling and looking up vector instruction tables in processors

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请基于申请号为202310225470.7申请日为2023年03月10日的中国专利申请提出，并要求该中国专利申请的优先权，该中国专利申请的全部内容在此引入本申请作为参考。This application is based on the Chinese patent application with application number 202310225470.7 and application date March 10, 2023, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby introduced into this application as a reference.

Technical Field

本公开涉及计算机技术领域，尤其涉及一种处理器中向量指令填表和查表方法、装置及电子设备。The present disclosure relates to the field of computer technology, and in particular to a method, device and electronic device for filling and looking up a table of vector instructions in a processor.

Background Art

随着大数据和人工智能技术的发展，人们对数据分析应用和人工智能算法运行性能的要求越来越高，对计算机的处理能力提出了越来越高的要求，向量处理技术成为了当前学术界和工业界共同关注的热点问题。向量处理技术中通常应用到向量函数，向量函数指在计算机程序中，所有在内部实现时使用了向量指令以及向量寄存器的算法函数，通过使用向量指令以及向量寄存器，使得函数内部计算时以数据并行的方式实现。With the development of big data and artificial intelligence technology, people have higher and higher requirements for the performance of data analysis applications and artificial intelligence algorithms, and have put forward higher and higher requirements for computer processing capabilities. Vector processing technology has become a hot topic of common concern in academia and industry. Vector processing technology is usually applied to vector functions. Vector functions refer to all algorithm functions in computer programs that use vector instructions and vector registers when implemented internally. By using vector instructions and vector registers, the internal calculation of the function is implemented in a data-parallel manner.

查表法是在计算机程序开发过程使用的一种优化方法，通过预先将一些需要的数值结果计算好，并存储在常量数组中，运行时直接从数组中取出，而不是临时计算得到，从而节省了计算开销。而向量查表法，指在向量函数中使用的查表法，向量查表的表项在一段时间内都会被使用到，如在向量寄存器中进行填表和查表操作占用向量寄存器资源和向量寄存器端口，在向量寄存器不够使用时还需要频繁进行向量寄存器和内存间的数据存取，影响流水线性能。The table lookup method is an optimization method used in the computer program development process. By calculating some required numerical results in advance and storing them in a constant array, they are directly taken out of the array during runtime instead of being temporarily calculated, thereby saving computing overhead. The vector table lookup method refers to the table lookup method used in vector functions. The table entries of the vector table will be used for a period of time. For example, filling and looking up tables in vector registers occupy vector register resources and vector register ports. When the vector registers are not enough, data access between vector registers and memory is required frequently, which affects pipeline performance.

发明内容Summary of the invention

本公开提出了一种处理器中向量指令填表和查表方法、装置及电子设备，旨在至少在一定程度上解决相关技术中的技术问题之一。The present disclosure proposes a method, device and electronic device for filling and looking up a table of vector instructions in a processor, aiming to solve one of the technical problems in the related art at least to a certain extent.

本公开第一方面实施例提出了一种处理器中向量指令填表和查表方法，包括：在处理器预设的存储空间配置向量指令查询表，其中，向量指令查询表的存储容量为项数与存储项位数的乘积；确定待填表的候选向量寄存器对应的填表类型，其中，填表类型包括第一类型和第二类型，第一类型为寄存器宽度与寄存器分组最大数量的乘积大于等于存储容量，第二类型为寄存器宽度与寄存器分组最大数量的乘积小于存储容量；利用与填表类型对应的填表规则，将多个候选向量寄存器中的目标向量寄存器存储的第一元素写入向量指令查询表；以及利用预设的查表规则，从向量指令查询表中获取待查询的第二元素。The first aspect of the present disclosure provides a method for filling and looking up a vector instruction table in a processor, comprising: configuring a vector instruction lookup table in a storage space preset by the processor, wherein the storage capacity of the vector instruction lookup table is the product of the number of items and the number of storage item bits; determining a table filling type corresponding to a candidate vector register to be filled in the table, wherein the table filling type includes a first type and a second type, the first type being that the product of the register width and the maximum number of register groups is greater than or equal to the storage capacity, and the second type being that the product of the register width and the maximum number of register groups is less than the storage capacity; using a table filling rule corresponding to the table filling type, writing a first element stored in a target vector register among multiple candidate vector registers into the vector instruction lookup table; and using a preset table lookup rule, obtaining a second element to be queried from the vector instruction lookup table.

本公开第二方面实施例提出了一种处理器中向量指令填表和查表装置，包括：配置模块，用于在处理器预设的存储空间配置向量指令查询表，其中，向量指令查询表的存储容量为项数与存储项位数的乘积；确定模块，用于确定待填表的候选向量寄存器对应的填表类型，其中，填表类型包括第一类型和第二类型，第一类型为寄存器宽度与寄存器分组最大数量的乘积大于等于存储容量，第二类型为寄存器宽度与寄存器分组最大数量的乘积小于存储容量；填表模块，用于利用与填表类型对应的填表规则，将多个候选向量寄存器中的目标向量寄存器存储的第一元素写入向量指令查询表；以及查表模块，用于利用预设的查表规则，从向量指令查询表中获取待查询的第二元素。The second aspect of the present disclosure provides a vector instruction table filling and table lookup device in a processor, including: a configuration module, used to configure a vector instruction lookup table in a storage space preset by the processor, wherein the storage capacity of the vector instruction lookup table is the product of the number of items and the number of storage item bits; a determination module, used to determine the table filling type corresponding to the candidate vector register to be filled in the table, wherein the table filling type includes a first type and a second type, the first type being that the product of the register width and the maximum number of register groups is greater than or equal to the storage capacity, and the second type being that the product of the register width and the maximum number of register groups is less than the storage capacity; a table filling module, used to write the first element stored in a target vector register among multiple candidate vector registers into the vector instruction lookup table using a table filling rule corresponding to the table filling type; and a table lookup module, used to obtain the second element to be queried from the vector instruction lookup table using a preset table lookup rule.

本公开第三方面实施例提出了一种电子设备，包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行本公开实施例的处理器中向量指令填表和查表方法。An embodiment of the third aspect of the present disclosure proposes an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the vector instruction table filling and table lookup method in the processor of the embodiment of the present disclosure.

本公开第四方面实施例提出了一种存储有计算机指令的非瞬时计算机可读存储介质，所述计算机指令用于使所述计算机执行本公开实施例公开的处理器中向量指令填表和查表方法。The fourth aspect embodiment of the present disclosure proposes a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to enable the computer to execute the vector instruction table filling and table lookup method in the processor disclosed in the embodiment of the present disclosure.

本公开第五方面实施例提出了一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现本公开实施例公开的处理器中向量指令填表和查表方法。The fifth aspect embodiment of the present disclosure proposes a computer program product, including a computer program, which, when executed by a processor, implements the vector instruction table filling and table lookup method in the processor disclosed in the embodiment of the present disclosure.

本公开实施例中，通过在处理器预设的存储空间配置向量指令查询表，其中，向量指令查询表的存储容量为项数与存储项位数的乘积，并确定待填表的候选向量寄存器对应的填表类型，其中，填表类型包括第一类型和第二类型，第一类型为寄存器宽度与寄存器分组最大数量的乘积大于等于存储容量，第二类型为寄存器宽度与寄存器分组最大数量的乘积小于存储容量，并利用与填表类型对应的填表规则，将多个候选向量寄存器中的目标向量寄存器存储的第一元素写入向量指令查询表，以及利用预设的查表规则，从向量指令查询表中获取待查询的第二元素，能够减少对向量寄存器空间和寄存器端口的占用，降低内存存取次数，并且利用与填表类型对应的填表规则和查表规则能够快速进行填表和查表操作，提高处理器性能。In the disclosed embodiment, a vector instruction lookup table is configured in a storage space preset by a processor, wherein the storage capacity of the vector instruction lookup table is the product of the number of items and the number of storage item bits, and a table filling type corresponding to a candidate vector register to be filled in the table is determined, wherein the table filling type includes a first type and a second type, wherein the first type is that the product of the register width and the maximum number of register groups is greater than or equal to the storage capacity, and the second type is that the product of the register width and the maximum number of register groups is less than the storage capacity, and a table filling rule corresponding to the table filling type is used to write a first element stored in a target vector register among multiple candidate vector registers into the vector instruction lookup table, and a preset table lookup rule is used to obtain a second element to be queried from the vector instruction lookup table, thereby reducing the occupancy of the vector register space and the register port, reducing the number of memory accesses, and utilizing the table filling rules and table lookup rules corresponding to the table filling type to quickly perform table filling and table lookup operations, thereby improving processor performance.

本公开附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本公开的实践了解到。Additional aspects and advantages of the present disclosure will be given in part in the following description and in part will be obvious from the following description or learned through practice of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present disclosure will become apparent and easily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:

图1是根据本公开一实施例提供的处理器中向量指令填表和查表方法的流程示意图；FIG1 is a flow chart of a method for filling and looking up a table of vector instructions in a processor according to an embodiment of the present disclosure;

图2A是根据本公开实施例提供的256位宽度的向量寄存器的结构示意图；FIG2A is a schematic diagram of the structure of a 256-bit width vector register provided according to an embodiment of the present disclosure;

图2B是与图2A中向量寄存器对应的向量指令查询表的结构示意图；FIG2B is a schematic diagram of the structure of a vector instruction lookup table corresponding to the vector register in FIG2A ;

图3A是根据本公开实施例提供的128位宽度的向量寄存器的结构示意图；FIG3A is a schematic diagram of the structure of a vector register with a width of 128 bits provided according to an embodiment of the present disclosure;

图3B是与图3A中向量寄存器对应的向量指令查询表的结构示意图；FIG3B is a schematic diagram of the structure of a vector instruction lookup table corresponding to the vector register in FIG3A ;

图4A是根据本公开实施例提供的另一种256位宽度的向量寄存器的结构示意图；FIG4A is a schematic diagram of the structure of another 256-bit width vector register provided according to an embodiment of the present disclosure;

图4B是与图4A中向量寄存器对应的向量指令查询表的结构示意图；FIG4B is a schematic diagram of the structure of a vector instruction lookup table corresponding to the vector register in FIG4A ;

图5是根据本公开另一实施例提供的处理器中向量指令填表和查表方法的流程示意图；5 is a schematic diagram of a flow chart of a method for filling and looking up a table of vector instructions in a processor according to another embodiment of the present disclosure;

图6是根据本公开另一实施例提供的处理器中向量指令填表和查表方法的流程示意图；6 is a schematic flow chart of a method for filling and looking up a table of vector instructions in a processor according to another embodiment of the present disclosure;

图7是根据本公开另一实施例提供的处理器中向量指令填表和查表装置的示意图；7 is a schematic diagram of a vector instruction table filling and table lookup device in a processor according to another embodiment of the present disclosure;

图8示出了适于用来实现本公开实施方式的示例性电子设备的框图。FIG8 shows a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure.

DETAILED DESCRIPTION

为了能够更清楚地理解本公开的上述目的、特征和优点，下面将对本公开的方案进行进一步描述。需要说明的是，在不冲突的情况下，本公开的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above-mentioned objectives, features and advantages of the present disclosure, the scheme of the present disclosure will be further described below. It should be noted that the embodiments of the present disclosure and the features in the embodiments can be combined with each other without conflict.

在下面的描述中阐述了很多具体细节以便于充分理解本公开，但本公开还可以采用其他不同于在此描述的方式来实施；显然，说明书中的实施例只是本公开的一部分实施例，而不是全部的实施例。In the following description, many specific details are set forth to facilitate a full understanding of the present disclosure, but the present disclosure may also be implemented in other ways different from those described herein; it is obvious that the embodiments in the specification are only part of the embodiments of the present disclosure, rather than all of the embodiments.

需要说明的是，本公开实施例的处理器中向量指令填表和查表方法的执行主体可以为处理器中向量指令填表和查表装置，该装置可以由软件和/或硬件的方式实现，该装置可以配置在电子设备中，电子设备可以包括但不限于终端、服务器端等。It should be noted that the executor of the method for filling and looking up a table of vector instructions in the processor of the embodiment of the present disclosure may be a device for filling and looking up a table of vector instructions in the processor, which may be implemented by software and/or hardware, and which may be configured in an electronic device, which may include but is not limited to a terminal, a server, etc.

图1是根据本公开一实施例提供的处理器中向量指令填表和查表方法的流程示意图，如图1所示，该方法包括步骤S101至步骤S103。FIG1 is a flow chart of a method for filling and looking up a table of vector instructions in a processor according to an embodiment of the present disclosure. As shown in FIG1 , the method includes steps S101 to S103 .

S101：在处理器预设的存储空间配置向量指令查询表。S101: configuring a vector instruction lookup table in a storage space preset by a processor.

本公开实施例，可以在处理器芯片上设置独立的存储空间，并且在该存储空间中配置向量指令查询表(也可以称为查找表，Table)，该向量指令查询表用于存储向量寄存器中的元素。In the embodiment of the present disclosure, an independent storage space may be set on the processor chip, and a vector instruction lookup table (also referred to as a lookup table, Table) may be configured in the storage space. The vector instruction lookup table is used to store elements in the vector register.

[根据细则91更正 29.06.2023]
其中，该向量指令查找表包括M个存储项，每个存储项用于存储向量指令的一个元素，并且每个存储项有对应的存储项位数N。其中，该存储项的项数M等于2的向量查表索引的索引位宽次幂，即：M＝2^索引位宽，该存储项位数N等于写入该向量指令查询表的元素(目的操作数)的有效元素宽度(effective element width，EEW)，即：N＝目的操作数EEW，本公开实施例，可以配置EEW与待操作的向量寄存器实际元素宽度SEW相等，例如，SEW＝8，则每个存储项有对应的存储项位数N为8位。并且，本公开实施例可以计算存储项项数M与存储项位数N的乘积作为向量指令查找表的存储容量，即：存储容量＝M*N。[Corrected 29.06.2023 in accordance with Article 91]
The vector instruction lookup table includes M storage items, each storage item is used to store an element of the vector instruction, and each storage item has a corresponding storage item bit number N. The number of items M of the storage items is equal to the index bit width of the vector lookup table index to the power of 2, that is, M = 2 ^{index bit width} , and the number of storage item bits N is equal to the effective element width (EEW) of the element (destination operand) written into the vector instruction lookup table, that is, N = destination operand EEW. In the embodiment of the present disclosure, the EEW can be configured to be equal to the actual element width SEW of the vector register to be operated, for example, SEW = 8, then each storage item has a corresponding storage item bit number N of 8 bits. In addition, the embodiment of the present disclosure can calculate the product of the number of storage items M and the number of storage items N as the storage capacity of the vector instruction lookup table, that is, storage capacity = M*N.

在一个实施例中，图2A是根据本公开实施例提供的256位宽度的向量寄存器的结构示意图，图2B是与图2A中向量寄存器对应的向量指令查询表的结构示意图，如图2B所示，本公开实施例配置的向量指令查询表(Table)例如为256项(即，M＝256)，包括a0、a1、...、a255存储项；每项SEW为8位整型，即：N＝8，则向量指令查询表的存储容量＝256*8＝2048位，SEW为其他值，以此类推。In one embodiment, Figure 2A is a structural diagram of a 256-bit wide vector register provided according to an embodiment of the present disclosure, and Figure 2B is a structural diagram of a vector instruction query table corresponding to the vector register in Figure 2A. As shown in Figure 2B, the vector instruction query table (Table) configured in the embodiment of the present disclosure is, for example, 256 items (i.e., M=256), including a0, a1, ..., a255 storage items; each SEW is an 8-bit integer, i.e.: N=8, then the storage capacity of the vector instruction query table = 256*8=2048 bits, SEW is other values, and so on.

S102：确定待填表的候选向量寄存器对应的填表类型。S102: Determine the table filling type corresponding to the candidate vector register to be filled.

其中，处理器支持的全部的向量寄存器都可以被称为候选向量寄存器，例如，RISC-V向量指令集包括的32个向量寄存器(v0-v31)可以被称为候选向量寄存器，也即是说，本公开实施例的候选向量寄存器为多个。Among them, all vector registers supported by the processor can be called candidate vector registers. For example, the 32 vector registers (v0-v31) included in the RISC-V vector instruction set can be called candidate vector registers. That is to say, there are multiple candidate vector registers in the embodiment of the present disclosure.

在实际应用中，候选向量寄存器可以配置不同的寄存器宽度(VLEN)，VLEN例如，64位、128位、256位、512位等等。In practical applications, the candidate vector registers may be configured with different register widths (VLEN), where VLEN may be, for example, 64 bits, 128 bits, 256 bits, 512 bits, and so on.

举例而言，如图2A所示的为RISC-V向量指令集包括的32个候选向量寄存器(v0-v31)，每个向量寄存器的寄存器宽度VLEN为256位(0-255)，即：256位向量寄存器，其中，每个向量寄存器存储32个元素，例如，v16存储a0-a31元素，每个元素的SEW为8位，每组寄存器的数量LMUL为8个。For example, as shown in Figure 2A, the RISC-V vector instruction set includes 32 candidate vector registers (v0-v31), and the register width VLEN of each vector register is 256 bits (0-255), that is: a 256-bit vector register, where each vector register stores 32 elements, for example, v16 stores a0-a31 elements, the SEW of each element is 8 bits, and the number of registers in each group LMUL is 8.

又例如，图3A是根据本公开实施例提供的128位宽度的向量寄存器的结构示意图，如图3A所示，32个候选向量寄存器(v0-v31)中的每个向量寄存器的寄存器宽度VLEN为128位(0-127)，即：128位向量寄存器，其中，每个向量寄存器存储16个元素，例如，v16存储a0-a15元素，每个元素的SEW为8位，每组寄存器的数量LMUL为8个。For another example, Figure 3A is a structural diagram of a 128-bit wide vector register provided according to an embodiment of the present disclosure. As shown in Figure 3A, the register width VLEN of each vector register in the 32 candidate vector registers (v0-v31) is 128 bits (0-127), that is: a 128-bit vector register, wherein each vector register stores 16 elements, for example, v16 stores a0-a15 elements, the SEW of each element is 8 bits, and the number LMUL of each group of registers is 8.

而本公开实施例，针对确定不同寄存器宽度VLEN的候选向量寄存器，需要确定对应的填表类型。其中，本公开实施例的填表类型例如包括第一类型和第二类型。In the embodiment of the present disclosure, for determining candidate vector registers of different register widths VLEN, it is necessary to determine the corresponding table filling type. The table filling type of the embodiment of the present disclosure includes, for example, a first type and a second type.

本公开实施例可以计算候选向量寄存器的寄存器宽度VLEN与寄存器分组最大数量(寄存器分组中寄存器的最大数量LMUL)的乘积，即：VLEN*最大LMUL，进一步判断VLEN*最大LMUL与向量指令查询表的存储容量(M*N)的大小关系，其中，第一类型为寄存器宽度VLEN与寄存器分组最大数量的乘积大于等于存储容量，第二类型为寄存器宽度VLEN与寄存器分组最大数量的乘积小于存储容量。需要说明的是，在RISC-V向量指令集中，寄存器分组最大数量LMUL为8。The disclosed embodiment can calculate the product of the register width VLEN of the candidate vector register and the maximum number of register groups (the maximum number of registers in the register group LMUL), that is: VLEN*maximum LMUL, and further determine the size relationship between VLEN*maximum LMUL and the storage capacity (M*N) of the vector instruction query table, wherein the first type is that the product of the register width VLEN and the maximum number of register groups is greater than or equal to the storage capacity, and the second type is that the product of the register width VLEN and the maximum number of register groups is less than the storage capacity. It should be noted that in the RISC-V vector instruction set, the maximum number of register groups LMUL is 8.

举例而言，存储容量(M*N)例如为2048位，如图2A所示的候选向量寄存器的寄存器宽度VLEN为256，寄存器分组最大数量LMUL为8，VLEN*最大LMUL等于M*N，则256位的候选向量寄存器对应的填表类型为第一类型；又例如，图3A所示的候选向量寄存器的寄存器宽度VLEN为128，寄存器分组最大数量LMUL为8，VLEN*最大LMUL小于M*N，则128位的候选向量寄存器对应的填表类型为第二类型。For example, the storage capacity (M*N) is 2048 bits, the register width VLEN of the candidate vector register shown in FIG2A is 256, the maximum number of register groups LMUL is 8, VLEN*maximum LMUL is equal to M*N, and the table filling type corresponding to the 256-bit candidate vector register is the first type; for another example, the register width VLEN of the candidate vector register shown in FIG3A is 128, the maximum number of register groups LMUL is 8, VLEN*maximum LMUL is less than M*N, and the table filling type corresponding to the 128-bit candidate vector register is the second type.

上述配置向量指令查询表后，进一步地，本公开实施例可以确定候选向量寄存器对应的填表类型，即：第一类型或者第二类型。After configuring the vector instruction query table as described above, the embodiment of the present disclosure can further determine the table filling type corresponding to the candidate vector register, that is, the first type or the second type.

S103：利用与填表类型对应的填表规则，将多个候选向量寄存器中的目标向量寄存器存储的第一元素写入向量指令查询表。S103: Using a table filling rule corresponding to the table filling type, write the first element stored in the target vector register among the plurality of candidate vector registers into the vector instruction lookup table.

其中，填表规则也可以被称为向量填表指令(table_fill)，其用于将多个候选向量寄存器中的目标向量寄存器存储的第一元素写入向量指令查询表，也即是说，将第一元素填充到向量指令查询表对应的项。Among them, the table filling rule can also be called a vector table filling instruction (table_fill), which is used to write the first element stored in the target vector register among multiple candidate vector registers into the vector instruction query table, that is, to fill the first element into the corresponding item of the vector instruction query table.

而多个候选向量寄存器中需要写入(填充)向量指令查询表Table的寄存器可以被称为目标向量寄存器，该目标向量寄存器可以是一个向量寄存器或者多个向量寄存器(即，一组向量寄存器)，也即是说，本公开实施例可以确定一个向量寄存器或者一组向量寄存器作为写入向量指令查询表Table的目标向量寄存器。其中，可以通过任意可能的方式确定目标向量寄存器。The register in the plurality of candidate vector registers that needs to be written (filled) into the vector instruction query table Table may be referred to as a target vector register, and the target vector register may be one vector register or multiple vector registers (i.e., a group of vector registers), that is, the embodiment of the present disclosure may determine one vector register or a group of vector registers as the target vector register for writing into the vector instruction query table Table. The target vector register may be determined in any possible manner.

而目标向量寄存器中需要写入向量指令查询表Table的元素(数据)可以被称为第一元素，也即是说，将目标向量寄存器中的内容(元素)填充到向量指令查询表Table。其中，在目标向量寄存器为多个向量寄存器的情况下，第一元素可以是多个目标向量寄存器中存储的全部元素；或者，在目标向量寄存器为一个向量寄存器的情况下，第一元素可以是该目标向量寄存器中的全部元素或者部分元素。The element (data) in the target vector register that needs to be written into the vector instruction query table Table can be called the first element, that is, the content (element) in the target vector register is filled into the vector instruction query table Table. Wherein, in the case where the target vector register is a plurality of vector registers, the first element can be all elements stored in the plurality of target vector registers; or, in the case where the target vector register is a single vector register, the first element can be all elements or part of the elements in the target vector register.

而本公开实施例，针对不同的填表类型可以配置不同的填表规则，也即是说，第一类型和第二类型的候选向量寄存器分别对应不同的填表规则，而本实施可以利用与填表类型对应的填表规则确定目标向量寄存器，并将目标向量寄存器存储的第一元素写入向量指令查询表。其中，关于填表规则的具体内容，此处不做具体限定。In the disclosed embodiment, different table filling rules can be configured for different table filling types, that is, the first type and the second type of candidate vector registers correspond to different table filling rules, and the present implementation can use the table filling rules corresponding to the table filling type to determine the target vector register, and write the first element stored in the target vector register into the vector instruction query table. The specific content of the table filling rules is not specifically limited here.

在一些实施例中，还可以利用预设的查表规则，从向量指令查询表中获取待查询的第二元素，并将第二元素写入预设的向量目的寄存器。In some embodiments, a preset table lookup rule may be used to obtain the second element to be queried from the vector instruction lookup table, and the second element may be written into a preset vector destination register.

其中，向量指令查询表Table中当前需要查询的元素可以被称为第二元素，也即是说，第二元素属于第一元素，该第二元素可以是一个元素或者多个元素。Among them, the element currently required to be queried in the vector instruction query table Table can be called the second element, that is, the second element belongs to the first element, and the second element can be one element or multiple elements.

而本公开实施例，可以预先配置查表规则(也可以称为向量查表指令)，该查表规则用于从向量指令查询表中获取该第二元素，也即是说，本公开实施例可以直接从向量指令查询表中读取向量元素，而不需要从向量寄存器中读取向量元素，从而能够减少对向量寄存器空间和寄存器端口的占用，降低内存存取次数。其中，查表规则可以是任意的规则。在一些实施例中，查表规则(即：查表指令)的形式可以记为：table_find.v vdest,vidx，其中，table_find.v表示对向量指令集中的向量查表，vidx表示向量索引寄存器，vdest表示向量目的寄存器。In the embodiment of the present disclosure, a table lookup rule (also referred to as a vector table lookup instruction) can be pre-configured, and the table lookup rule is used to obtain the second element from the vector instruction lookup table. That is to say, the embodiment of the present disclosure can directly read the vector element from the vector instruction lookup table without reading the vector element from the vector register, thereby reducing the occupation of the vector register space and register port and reducing the number of memory accesses. The table lookup rule can be an arbitrary rule. In some embodiments, the form of the table lookup rule (i.e., table lookup instruction) can be recorded as: table_find.v vdest, vidx, wherein table_find.v represents a vector table lookup in the vector instruction set, vidx represents a vector index register, and vdest represents a vector destination register.

在一些实施例中，查表规则可以是首先确定一个向量索引寄存器vidx，例如，可以从多个候选向量寄存器中随机选择一个寄存器作为向量索引寄存器vidx。In some embodiments, the table lookup rule may be to first determine a vector index register vidx, for example, a register may be randomly selected from a plurality of candidate vector registers as the vector index register vidx.

进一步地，根据向量指令查询表的项数M确定索引位宽，也即是说，根据M＝2^索引位宽确定索引位宽，例如，本公开实施例的向量指令查询表的项数为256项(即，M＝256)，即：256＝2⁸，也即是说，本公开实施例的向量指令查询表的索引位宽为8位。Further, the index bit width is determined according to the number of items M of the vector instruction lookup table, that is, the index bit width is determined according to ^{the index bit width} M=2. For example, the number of items of the vector instruction lookup table in the embodiment of the present disclosure is 256 items (that is, M=256), that is: 256=2 ⁸ , that is, the index bit width of the vector instruction lookup table in the embodiment of the present disclosure is 8 bits.

进一步地，计算向量索引寄存器vidx的寄存器宽度VLEN与索引位宽的比值，作为索引数量，例如，如图2A和2B所示，从多个候选向量寄存器中随机选择的一个向量索引寄存器vidx的寄存器宽度为256位，则索引数量为32个，也即是说，将256位的向量索引寄存器vidx可以分为32个8位的索引，表示为idx0、idx1、idx2、idx3、……、idx29、idx30、idx31；又例如，如图3A和3B所示，从多个候选向量寄存器中随机选择的一个向量索引寄存器vidx的寄存器宽度为128位，则索引数量为16个，也即是说，将128位的向量索引寄存器vidx分为16个8位的索引，表示为idx0、idx1、idx2、idx3、……、idx29、idx30、idx15。Further, the ratio of the register width VLEN of the vector index register vidx to the index bit width is calculated as the number of indexes. For example, as shown in Figures 2A and 2B, the register width of a vector index register vidx randomly selected from multiple candidate vector registers is 256 bits, and the number of indexes is 32, that is, the 256-bit vector index register vidx can be divided into 32 8-bit indexes, represented as idx0, idx1, idx2, idx3, ..., idx29, idx30, idx31; for another example, as shown in Figures 3A and 3B, the register width of a vector index register vidx randomly selected from multiple candidate vector registers is 128 bits, and the number of indexes is 16, that is, the 128-bit vector index register vidx is divided into 16 8-bit indexes, represented as idx0, idx1, idx2, idx3, ..., idx29, idx30, idx15.

进一步地，本公开实施例基于索引数量和待查询的目标索引值对向量指令查询表进行并行查表，以获取与目标索引值对应的第二元素。具体地，向量指令查询表中的第一元素可以有对应的索引值，索引值例如存储项的序号，例如存储项为256个，则索引值依次为0、1、2、...、255，而当前待查询的元素对应的索引值可以被称为目标索引值，而查找到的与目标索引值对应的元素可以被称为第二元素。Furthermore, the disclosed embodiment performs a parallel lookup of the vector instruction lookup table based on the number of indexes and the target index value to be queried to obtain a second element corresponding to the target index value. Specifically, the first element in the vector instruction lookup table may have a corresponding index value, such as the sequence number of the storage item, for example, if there are 256 storage items, the index values are 0, 1, 2, ..., 255 in sequence, and the index value corresponding to the current element to be queried may be referred to as the target index value, and the element corresponding to the target index value found may be referred to as the second element.

例如，如图2A和2B所示，本公开实施例256位的向量索引寄存器vidx分为32个8位的索引idx0、idx1、idx2、idx3、……、idx29、idx30、idx31，目标索引值例如为{254,251,250,……,7,5,3,2}共计32个，则本公开实施例可以利用32个8位的索引在向量指令查询表中对32个目标索引值进行并行查表，idx0、idx1、idx2、idx3、……、idx29、idx30、idx31分别为2(二进制00000010)、3(二进制00000011)、5(二进制00000101)、7(二进制00000111)、……、250(二进制11111010)、251(二进制11111011)、254(二进制11111110)，也即是说，从向量指令查询表Table中按照索引vidx并行查到的32个第二元素，分别为a2、a3、a5、a7、……、a250、a251、a254，每个元素8位。For example, as shown in FIGS. 2A and 2B , the 256-bit vector index register vidx of the embodiment of the present disclosure is divided into 32 8-bit indexes idx0, idx1, idx2, idx3, ..., idx29, idx30, idx31, and the target index values are, for example, {254, 251, 250, ..., 7, 5, 3, 2}, totaling 32. Then, the embodiment of the present disclosure can use 32 8-bit indexes to perform parallel lookup on the 32 target index values in the vector instruction lookup table, idx0, idx1, idx2, idx3, ..., idx29, idx30 , idx31 are 2 (binary 00000010), 3 (binary 00000011), 5 (binary 00000101), 7 (binary 00000111), ..., 250 (binary 11111010), 251 (binary 11111011), 254 (binary 11111110), that is, the 32 second elements found in parallel from the vector instruction query table Table according to the index vidx are a2, a3, a5, a7, ..., a250, a251, a254, each element is 8 bits.

又例如，如图3A和3B所示，本公开实施例128位的向量索引寄存器vidx分为16个8位的索引idx0、idx1、idx2、idx3、……、idx15，目标索引值例如为{254,251,122,……,7,5,3,2}共计16个，则本公开实施例可以利用16个8位的索引在向量指令查询表中对16个目标索引值进行并行查表，idx0、idx1、idx2、idx3、……、idx15分别为2(二进制00000010)、3(二进制00000011)、5(二进制00000101)、7(二进制00000111)、……、250(二进制11111010)、251(二进制11111011)、254(二进制11111110)，也即是说，从向量指令查询表Table中按照索引vidx并行查到的16个第二元素，分别为a2、a3、a5、a7、……、a250、a251、a254，每个元素8位。For another example, as shown in FIGS. 3A and 3B , the 128-bit vector index register vidx in the embodiment of the present disclosure is divided into 16 8-bit indexes idx0, idx1, idx2, idx3, ..., idx15, and the target index values are, for example, {254, 251, 122, ..., 7, 5, 3, 2}, totaling 16. Then, the embodiment of the present disclosure can use 16 8-bit indexes to perform parallel lookup on the 16 target index values in the vector instruction lookup table, and idx0, idx1, idx2, idx3, ..., idx15 are 2 (two In other words, the 16 second elements found in parallel from the vector instruction lookup table Table according to the index vidx are a2, a3, a5, a7, ..., a250, a251, a254, and each element is 8 bits.

进一步地，本公开实施例将第二元素写入预设的向量目的寄存器。其中，例如可以从多个候选向量寄存器中选择任意的一个寄存器(例如v25)作为向量目的寄存器，其可以用vdest表示，而本公开实施例可以将获取到的第二元素写入该向量目的寄存器vdest，完成向量查表操作。例如，将查询到的a2、a3、a5、a7、……、a250、a251、a254等32个第二元素写回到向量目的寄存器vdest(例如，v25)，得到向量目的寄存器vdest的值为{a254,a251,a250,……,a7,a5,a3,a2}；又例如，将a2、a3、a5、a7、……、a250、a251、a254等16个第二元素写回到向量目的寄存器vdest(例如，v25)，得到向量目的寄存器vdest的值为{a254,a251,a250,……,a7,a5,a3,a2}。Furthermore, the embodiment of the present disclosure writes the second element into a preset vector destination register. For example, any one register (e.g., v25) can be selected from a plurality of candidate vector registers as the vector destination register, which can be represented by vdest, and the embodiment of the present disclosure can write the acquired second element into the vector destination register vdest to complete the vector table lookup operation. For example, the 32 second elements a2, a3, a5, a7, ..., a250, a251, a254, etc., which are queried, are written back to the vector destination register vdest (for example, v25), and the value of the vector destination register vdest is {a254, a251, a250, ..., a7, a5, a3, a2}; for another example, the 16 second elements a2, a3, a5, a7, ..., a250, a251, a254, etc. are written back to the vector destination register vdest (for example, v25), and the value of the vector destination register vdest is {a254, a251, a250, ..., a7, a5, a3, a2}.

在一些实施例中，向量索引寄存器vidx和向量目的寄存器vdest需要具有相同的元素数量，但是元素宽度可能不同，此时向量索引寄存器vidx和向量目的寄存器vdest的EEW和EMUL(为寄存器中元素位宽为EEW时寄存器组中寄存器个数)就不等于SEW和LMUL,但是EEW/EMUL＝SEW/LMUL，这样才能保证元素个数相同，例如，向量索引寄存器vidx的EEW＝SEW，EMUL＝LMUL，但是向量目的寄存器vdest的EEW＝2*SEW，EMUL＝2*LMUL，即：向量索引寄存器vidx的元素宽度与向量目的寄存器vdest的元素宽度不同。在这种情况下，本公开实施例将第二元素写入预设的向量目的寄存器的过程中，可以判断向量目的寄存器vdest元素宽度与向量索引寄存器vidx元素宽度的大小关系，在向量目的寄存器的元素宽度大于向量索引寄存器的元素宽度、且向量目的寄存器的元素宽度与存储项位数相同的情况下，本公开实施例的查表指令为加宽指令，根据上述EEW/EMUL＝SEW/LMUL关系确定向量目的寄存器的数量，即：一组多个向量目的寄存器vdest，其中，一组向量目的寄存器的数量EMUL＝EEW*LMUL/SEW；进一步地，将第二元素写入多个向量目的寄存器。In some embodiments, the vector index register vidx and the vector destination register vdest need to have the same number of elements, but the element widths may be different. In this case, the EEW and EMUL (the number of registers in the register group when the element width in the register is EEW) of the vector index register vidx and the vector destination register vdest are not equal to SEW and LMUL, but EEW/EMUL=SEW/LMUL, so as to ensure that the number of elements is the same. For example, the EEW of the vector index register vidx is SEW, and EMUL is LMUL, but the EEW of the vector destination register vdest is 2*SEW, and EMUL is 2*LMUL, that is, the element width of the vector index register vidx is different from the element width of the vector destination register vdest. In this case, in the process of writing the second element into a preset vector destination register in the embodiment of the present disclosure, the size relationship between the element width of the vector destination register vdest and the element width of the vector index register vidx can be determined. When the element width of the vector destination register is greater than the element width of the vector index register, and the element width of the vector destination register is the same as the number of storage item bits, the table lookup instruction of the embodiment of the present disclosure is a widening instruction, and the number of vector destination registers is determined according to the above-mentioned EEW/EMUL=SEW/LMUL relationship, that is: a group of multiple vector destination registers vdest, wherein the number of a group of vector destination registers EMUL=EEW*LMUL/SEW; further, the second element is written into multiple vector destination registers.

举例而言，图4A是根据本公开实施例提供的另一种256位宽度的向量寄存器的结构示意图，图4B是与图4A中向量寄存器对应的向量指令查询表的结构示意图，如图4B所示，本公开实施例配置的向量指令查询表(Table)例如为256项(即，M＝256)，包括a0、a1、...、a255存储项；每项SEW为16位整型，即：N＝16，则向量指令查询表的存储容量＝256*16＝4096位，SEW为其他值，以此类推。如图4A和4B所示，一个向量索引寄存器vidx(LMUL＝1)的寄存器宽度VLEN为256位，每位元素的元素宽度EEW＝SEW＝8位，M＝256项，N＝16位，向量目的寄存器vdest的元素宽度EEW＝16，则向量目的寄存器的数量EMUL＝EEW*LMUL/SEW＝16*1/8＝2，也即是说，一组2个向量目的寄存器，例如v24和v25。从Table中按照索引vidx并行查到的32个元素分别为b2、b3、b5、b7、……、b250、b251、b254，每个元素16位，写回到向量目的寄存器v24和v25。得到向量目的寄存器v24的值为{……,b7,b5,b3,b2}，向量目的寄存器v25的值为{b254,b251,b250,……}，完成向量查表，图4A和4B中_low和_high，分别表示元素的低8位和高8位。For example, Figure 4A is a structural diagram of another 256-bit width vector register provided according to an embodiment of the present disclosure, and Figure 4B is a structural diagram of a vector instruction query table corresponding to the vector register in Figure 4A. As shown in Figure 4B, the vector instruction query table (Table) configured in the embodiment of the present disclosure is, for example, 256 items (i.e., M=256), including a0, a1,..., a255 storage items; each SEW is a 16-bit integer, i.e.: N=16, then the storage capacity of the vector instruction query table = 256*16=4096 bits, SEW is other values, and so on. As shown in Figures 4A and 4B, the register width VLEN of a vector index register vidx (LMUL = 1) is 256 bits, the element width EEW = SEW = 8 bits per element, M = 256 items, N = 16 bits, the element width EEW = 16 of the vector destination register vdest, then the number of vector destination registers EMUL = EEW * LMUL / SEW = 16 * 1 / 8 = 2, that is, a group of 2 vector destination registers, such as v24 and v25. The 32 elements found in parallel from the Table according to the index vidx are b2, b3, b5, b7, ..., b250, b251, b254, each element is 16 bits, and are written back to the vector destination registers v24 and v25. The value of the vector destination register v24 is {…, b7, b5, b3, b2}, and the value of the vector destination register v25 is {b254, b251, b250, …}, completing the vector table lookup. In Figures 4A and 4B, _low and _high represent the lower 8 bits and higher 8 bits of the element, respectively.

在实际应用中，查表指令可以记为table_find.v vdest,vidx，vdest表示向量目的寄存器，vidx表示向量索引寄存器，例如vidx为v24，vdest为v25，则查表指令为table_find.v v25,v24。In practical applications, the table lookup instruction can be recorded as table_find.v vdest,vidx, where vdest represents the vector destination register and vidx represents the vector index register. For example, if vidx is v24 and vdest is v25, the table lookup instruction is table_find.v v25,v24.

图5是根据本公开一实施例提供的处理器中向量指令填表和查表方法的流程示意图，如图5所示，该方法包括步骤S501至步骤S506。FIG5 is a flow chart of a method for filling and looking up a table of vector instructions in a processor according to an embodiment of the present disclosure. As shown in FIG5 , the method includes steps S501 to S506 .

S501：在处理器预设的存储空间配置向量指令查询表。S501: configuring a vector instruction lookup table in a storage space preset by a processor.

S502：确定待填表的候选向量寄存器对应的填表类型。S502: Determine the table filling type corresponding to the candidate vector register to be filled.

S501-S502的具体说明参见上述实施例，此处不再赘述。The specific description of S501 - S502 refers to the above embodiment and will not be repeated here.

S503：计算存储容量与候选向量寄存器的寄存器宽度的比值，以作为寄存器分组中寄存器的第一数量。S503: Calculate a ratio of the storage capacity to the register width of the candidate vector register to serve as a first number of registers in the register group.

本公开实施例，候选向量寄存器对应的填表类型为第一类型，在这种情况下，本公开实施例的填表规则是首先计算存储容量与候选向量寄存器的寄存器宽度(VLEN)的比值，并将该比值作为寄存器分组中实际寄存器的第一数量LMUL，也即是说，第一数量LMUL＝M*N/VLEN。In the embodiment of the present disclosure, the table filling type corresponding to the candidate vector register is the first type. In this case, the table filling rule of the embodiment of the present disclosure is to first calculate the ratio of the storage capacity to the register width (VLEN) of the candidate vector register, and use the ratio as the first number LMUL of the actual registers in the register group, that is, the first number LMUL=M*N/VLEN.

例如，候选向量寄存器的宽度VLEN为256位，M*N＝2048，则第一数量LMUL为8，即LMUL＝2048/256＝8；又例如，候选向量寄存器的宽度VLEN为512位，则第一数量LMUL为4；又例如，候选向量寄存器的宽度VLEN为1024位，则第一数量LMUL为2,；又例如，候选向量寄存器的宽度VLEN为2048位，则第一数量LMUL为1；又例如，候选向量寄存器的宽度VLEN为4096位，则第一数量LMUL为1/2，即1/2个寄存器为一组；又例如，候选向量寄存器的宽度VLEN为8192位，则第一数量LMUL为1/4，即1/4个寄存器为一组；又例如，候选向量寄存器的宽度VLEN为16384位，则第一数量LMUL为1/8，即1/8个寄存器为一组，依次类推。For example, if the width VLEN of the candidate vector register is 256 bits, M*N=2048, then the first number LMUL is 8, that is, LMUL=2048/256=8; for another example, if the width VLEN of the candidate vector register is 512 bits, then the first number LMUL is 4; for another example, if the width VLEN of the candidate vector register is 1024 bits, then the first number LMUL is 2; for another example, if the width VLEN of the candidate vector register is 2048 bits, then the first number LMUL is 1; for another example, if the width VLEN of the candidate vector register is 4096 bits, then the first number LMUL is 1/2, that is, 1/2 registers are a group; for another example, if the width VLEN of the candidate vector register is 8192 bits, then the first number LMUL is 1/4, that is, 1/4 registers are a group; for another example, if the width VLEN of the candidate vector register is 16384 bits, then the first number LMUL is 1/8, that is, 1/8 registers are a group, and so on.

S504：从多个候选向量寄存器中确定第一数量个寄存器作为目标向量寄存器。S504: Determine a first number of registers from a plurality of candidate vector registers as target vector registers.

也即是说，从32个候选向量寄存器中确定第一数量LMUL个寄存器作为一组目标向量寄存器，例如，选择8个(LMUL)寄存器作为一组目标向量寄存器，或者选择4个寄存器作为一组目标向量寄存器，或者选择2个寄存器作为一组目标向量寄存器等。其中，可以通过任意的方式从32个候选向量寄存器中确定目标向量寄存器，例如，随机选择一组目标向量寄存器。That is, a first number LMUL of registers are determined from the 32 candidate vector registers as a group of target vector registers, for example, 8 (LMUL) registers are selected as a group of target vector registers, or 4 registers are selected as a group of target vector registers, or 2 registers are selected as a group of target vector registers, etc. The target vector registers can be determined from the 32 candidate vector registers in any manner, for example, a group of target vector registers is randomly selected.

在一些实施例中，为了提高填表和后续查表的效率，本公开实施例可以确定连续的一组寄存器作为目标向量寄存器。In some embodiments, in order to improve the efficiency of table filling and subsequent table lookup, the embodiments of the present disclosure may determine a continuous group of registers as target vector registers.

本公开实施例首先可以从多个候选向量寄存器的编号中确定源寄存器编号，其可以用vsrc表示，例如，32个候选向量寄存器的编号即为0、1、2、....、31，而本公开实施例可以从多个编号中确定一个源寄存器编号vsrc。The disclosed embodiment can first determine the source register number from the numbers of multiple candidate vector registers, which can be represented by vsrc. For example, the numbers of 32 candidate vector registers are 0, 1, 2, ..., 31, and the disclosed embodiment can determine a source register number vsrc from the multiple numbers.

例如，源寄存器编号vsrc可以是0、1、2、....、31多个编号中的任意一个编号。For example, the source register number vsrc may be any one of 0, 1, 2, ..., 31.

又例如，源寄存器编号例如可以与第一数量LMUL对齐，例如，LMUL为8，则向量填表指令(填表规则)的源寄存器编号vsrc例如为0、8、16、24中的任一个；如LMUL为4，则向量填表指令的源寄存器编号vsrc例如为0、4、8、12、16、20、24、28中的任一个；如LMUL为2，则向量填表指令的源寄存器编号vsrc例如为0、2、4、6、8、10、12、14、16、18、20、22、24、26、28、30中的任一个；如LMUL为1/8、1/4、1/2、1，则向量填表指令的源寄存器编号vsrc可以是任意一个寄存器编号。For another example, the source register number can be aligned with the first number LMUL. For example, if LMUL is 8, the source register number vsrc of the vector table filling instruction (table filling rule) is, for example, any one of 0, 8, 16, and 24; if LMUL is 4, the source register number vsrc of the vector table filling instruction is, for example, any one of 0, 4, 8, 12, 16, 20, 24, and 28; if LMUL is 2, the source register number vsrc of the vector table filling instruction is, for example, any one of 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, and 30; if LMUL is 1/8, 1/4, 1/2, or 1, the source register number vsrc of the vector table filling instruction can be any register number.

进一步地，确定从源寄存器编号开始、连续的第一数量个寄存器作为目标向量寄存器。Further, a first number of registers consecutively starting from the source register number is determined as the target vector registers.

举例而言，如图2A和2B所示，寄存器宽度VLEN为256位，LMUL为8，源寄存器编号vsrc例如为16(即，v16)，则可以确定从16号开始连续的8个寄存器作为一组目标向量寄存器，即：目标向量寄存器为v16到v23。For example, as shown in Figures 2A and 2B, the register width VLEN is 256 bits, LMUL is 8, and the source register number vsrc is, for example, 16 (ie, v16). Then, the eight consecutive registers starting from number 16 can be determined as a group of target vector registers, that is, the target vector registers are v16 to v23.

S505：将目标向量寄存器存储的第一元素写入向量指令查询表。S505: Write the first element stored in the target vector register into the vector instruction lookup table.

在实际应用中，可以预先配置第一类型对应的填表规则(填表指令)，记为table_fill.v vsrc，其中，table_fill.v表示对向量指令集中的向量填表，vsrc表示源寄存器编号。例如，源寄存器编号vsrc例如为16，则填表指令为table_fill.v v16，进一步地，用table_fill.v v16指令将一组8个目标向量寄存器v16到v23中的元素a0-a255(即，第一元素)填充到向量指令查询表table的第0到第255项，也即是说，本公开实施例中目标向量寄存器的第一元素数与向量指令查询表的项数相同。从而，本公开实施例可以利用填表规则将大于等于256位的寄存器中的元素写入向量指令查询表。In practical applications, the table filling rules (table filling instructions) corresponding to the first type can be pre-configured, recorded as table_fill.v vsrc, where table_fill.v represents the table filling of the vector in the vector instruction set, and vsrc represents the source register number. For example, if the source register number vsrc is 16, the table filling instruction is table_fill.v v16. Further, the table_fill.v v16 instruction is used to fill the elements a0-a255 (i.e., the first element) in a group of 8 target vector registers v16 to v23 into the 0th to 255th items of the vector instruction query table, that is, the number of the first elements of the target vector register in the embodiment of the present disclosure is the same as the number of items in the vector instruction query table. Therefore, the embodiment of the present disclosure can use the table filling rules to write elements in registers greater than or equal to 256 bits into the vector instruction query table.

S506：利用预设的查表规则，从向量指令查询表中获取待查询的第二元素。S506: Obtain the second element to be queried from the vector instruction query table using a preset table query rule.

S506的具体说明参见上述实施例，此处不再赘述。The specific description of S506 refers to the above embodiment and will not be repeated here.

本公开实施例中，通过在处理器预设的存储空间配置向量指令查询表，其中，向量指令查询表的存储容量为项数与存储项位数的乘积，并确定待填表的候选向量寄存器对应的填表类型，其中，填表类型包括第一类型和第二类型，第一类型为寄存器宽度与寄存器分组最大数量的乘积大于等于存储容量，第二类型为寄存器宽度与寄存器分组最大数量的乘积小于存储容量，并利用与填表类型对应的填表规则，将多个候选向量寄存器中的目标向量寄存器存储的第一元素写入向量指令查询表，以及利用预设的查表规则，从向量指令查询表中获取待查询的第二元素，能够减少对向量寄存器空间和寄存器端口的占用，降低内存存取次数，并且利用与填表类型对应的填表规则和查表规则能够快速进行填表和查表操作，提高处理器性能。此外，本公开实施例可以利用填表规则将大于等于256位的寄存器中的元素写入向量指令查询表。In the embodiment of the present disclosure, by configuring a vector instruction query table in a storage space preset by a processor, wherein the storage capacity of the vector instruction query table is the product of the number of items and the number of bits of the storage items, and determining the table filling type corresponding to the candidate vector register to be filled in the table, wherein the table filling type includes a first type and a second type, wherein the first type is that the product of the register width and the maximum number of register groups is greater than or equal to the storage capacity, and the second type is that the product of the register width and the maximum number of register groups is less than the storage capacity, and using the table filling rules corresponding to the table filling type, the first element stored in the target vector register in multiple candidate vector registers is written into the vector instruction query table, and using the preset table lookup rules, the second element to be queried is obtained from the vector instruction query table, which can reduce the occupation of the vector register space and the register port, reduce the number of memory accesses, and use the table filling rules and table lookup rules corresponding to the table filling type to quickly perform table filling and table lookup operations, thereby improving processor performance. In addition, the embodiment of the present disclosure can use the table filling rules to write elements in registers greater than or equal to 256 bits into the vector instruction query table.

图6是根据本公开一实施例提供的处理器中向量指令填表和查表方法的流程示意图，如图6所示，该方法包括步骤S601至步骤S606。FIG6 is a flow chart of a method for filling and looking up a table of vector instructions in a processor according to an embodiment of the present disclosure. As shown in FIG6 , the method includes steps S601 to S606 .

S601：在处理器预设的存储空间配置向量指令查询表。S601: configuring a vector instruction lookup table in a storage space preset by a processor.

S602：确定待填表的候选向量寄存器对应的填表类型。S602: Determine the table filling type corresponding to the candidate vector register to be filled.

S601-S602的具体说明参见上述实施例，此处不再赘述。The specific description of S601 - S602 refers to the above embodiment and will not be repeated here.

S603：计算存储容量与第一目标值的比值，以作为填表次数。S603: Calculate the ratio of the storage capacity to the first target value as the number of form filling times.

其中，第一目标值为候选向量寄存器的寄存器宽度与寄存器分组最大数量乘积。The first target value is the product of the register width of the candidate vector register and the maximum number of register groups.

本公开实施例，候选向量寄存器对应的填表类型为第二类型，在这种情况下，本公开实施例的填表规则是首先计算存储容量(M*N)与第一目标值的比值，以作为填表次数，其中，第一目标值为候选向量寄存器的寄存器宽度VLEN与寄存器分组最大数量(最大LMUL＝8)的乘积(VLEN*最大LMUL)，也即是说，填表次数＝M*N/VLEN*最大LMUL。In the embodiment of the present disclosure, the table filling type corresponding to the candidate vector register is the second type. In this case, the table filling rule of the embodiment of the present disclosure is to first calculate the ratio of the storage capacity (M*N) to the first target value as the number of table fillings, wherein the first target value is the product (VLEN*maximum LMUL) of the register width VLEN of the candidate vector register and the maximum number of register groups (maximum LMUL=8), that is, the number of table fillings=M*N/VLEN*maximum LMUL.

其中，填表次数对应填表指令的条数，每条填表指令可以将一组目标向量寄存器中的第一元素写入向量指令查询表，也即是说，在填表类型为第二类型时，需要多条填表指令进行多次填表操作。Among them, the number of table filling times corresponds to the number of table filling instructions, and each table filling instruction can write the first element in a set of target vector registers into the vector instruction query table. That is to say, when the table filling type is the second type, multiple table filling instructions are required to perform multiple table filling operations.

举例而言，如图3A和3B所示，例如向量指令查询表的存储容量(M*N)为2048，寄存器宽度VLEN为128，最大LMUL为8，则填表次数(即，填表指令条数)为M*N/VLEN*最大LMUL＝2048/128*8＝2，也即是说，需要两条填表指令进行两次填表；又例如，向量指令查询表的存储容量(M*N)为2048，寄存器宽度VLEN为64，最大LMUL为8，则填表次数(即，填表指令条数)为2048/64*8＝4，也即是说，需要四条填表指令进行四次填表。For example, as shown in Figures 3A and 3B, if the storage capacity (M*N) of the vector instruction query table is 2048, the register width VLEN is 128, and the maximum LMUL is 8, then the number of table filling times (i.e., the number of table filling instructions) is M*N/VLEN*maximum LMUL=2048/128*8=2, that is, two table filling instructions are required to fill the table twice; for another example, if the storage capacity (M*N) of the vector instruction query table is 2048, the register width VLEN is 64, and the maximum LMUL is 8, then the number of table filling times (i.e., the number of table filling instructions) is 2048/64*8=4, that is, four table filling instructions are required to fill the table four times.

S604：从多个候选向量寄存器中确定与填表次数相同的多个寄存器分组。S604: Determine, from the plurality of candidate vector registers, a plurality of register groups having the same number of table filling times.

其中，每个寄存器分组包括第二数量个目标向量寄存器。本公开实施例中每个寄存器分组中寄存器的数量可以被称为第二数量(LMUL)，该第二数量LMUL可以是固定值，例如LMUL＝8，也即是说，8个目标向量寄存器作为一个寄存器分组。Each register group includes a second number of target vector registers. In the disclosed embodiment, the number of registers in each register group may be referred to as a second number (LMUL), which may be a fixed value, such as LMUL=8, that is, 8 target vector registers are regarded as a register group.

而本公开实施例，可以从多个候选向量寄存器确定与填表次数相同的多个寄存器分组，例如，填表次数为2，则确定两个寄存器分组，每个寄存器分组中包括8个目标向量寄存器。其中，可以通过任意的方式确定多个寄存器分组，例如，随机选择8个目标向量寄存器作为一个寄存器分组。In the embodiment of the present disclosure, multiple register groups with the same number of table filling times can be determined from multiple candidate vector registers. For example, if the number of table filling times is 2, two register groups are determined, each register group includes 8 target vector registers. The multiple register groups can be determined in any manner, for example, 8 target vector registers are randomly selected as a register group.

在一些实施例中，从多个候选向量寄存器的编号中确定与填表次数相同的多个源寄存器编号，其中，源寄存器编号的间隔至少为第二数量。In some embodiments, a plurality of source register numbers equal to the number of table fillings are determined from the plurality of candidate vector register numbers, wherein the interval between the source register numbers is at least a second number.

例如，填表次数为2次(即，2条填表指令)，多个候选向量寄存器的编号为0、1、2、....、31，则本公开实施例可以从多个编号中确定2个编号作为源寄存器编号vsrc，其中，2个源寄存器编号vsrc的间隔至少为第二数量(8)，例如，2个源寄存器编号vsrc分别为8和16。For example, the number of table filling operations is 2 times (i.e., 2 table filling instructions), and the multiple candidate vector registers are numbered 0, 1, 2, ..., 31. Then, the embodiment of the present disclosure can determine 2 numbers from the multiple numbers as source register numbers vsrc, wherein the interval between the two source register numbers vsrc is at least the second number (8), for example, the two source register numbers vsrc are 8 and 16 respectively.

进一步地，确定从每个源寄存器编号开始、连续的第二数量个寄存器构成每个寄存器分组。也即是说，将每个源寄存器编号作为一个寄存器分组的起始位，从起始位开始确定连续的第二数量个寄存器作为该寄存器分组中的目标向量寄存器。Further, a second number of registers starting from each source register number is determined to constitute each register group. In other words, each source register number is used as the starting bit of a register group, and a second number of registers starting from the starting bit are determined as the target vector registers in the register group.

例如，2个源寄存器编号vsrc分别为8和16，则本公开实施例可以确定v8到v15作为一个寄存器分组，确定v16到v23作为另一个寄存器分组。For example, the two source register numbers vsrc are 8 and 16 respectively, then the embodiment of the present disclosure may determine v8 to v15 as one register group, and determine v16 to v23 as another register group.

S605：依次将每个寄存器分组中的目标向量寄存器存储的第一元素写入向量指令查询表。S605: Write the first element stored in the target vector register in each register group into the vector instruction lookup table in sequence.

也即是说，本公开实施例首先可以将v8到v15(寄存器分组)的目标向量寄存器存储的第一元素写入向量指令查询表，然后将v16到v23(寄存器分组)的目标向量寄存器存储的第一元素写入向量指令查询表。That is to say, the disclosed embodiment may first write the first element stored in the target vector registers from v8 to v15 (register grouping) into the vector instruction lookup table, and then write the first element stored in the target vector registers from v16 to v23 (register grouping) into the vector instruction lookup table.

在一些实施例中，在填表过程中，本公开实施例还可以确定多个寄存器分组对应的多个偏移数(也可以称为偏移量)，其可以用imm表示，该偏移数imm为非负整数，其中，多个偏移数依次为0至n的整数，n为填表次数减一，也即是说，偏移数imm取值从0到M*N/VLEN*最大LMUL-1。In some embodiments, during the table filling process, the embodiments of the present disclosure can also determine multiple offset numbers (also referred to as offsets) corresponding to multiple register groups, which can be represented by imm. The offset number imm is a non-negative integer, wherein the multiple offset numbers are integers from 0 to n in sequence, and n is the number of table fillings minus one, that is, the offset number imm takes values from 0 to M*N/VLEN*maximum LMUL-1.

举例而言，如VLEN为128位，LMUL的值为8，则填表次数为2(需要2条填表指令完成)，寄存器分组对应的偏移数imm分别为0和1，也即是说，第一条填表指令的偏移数imm为0(二进制00000000)，第二条填表指令的偏移数imm为1(二进制00000001)；又例如，VLEN为64位，LMUL的值也为8，则填表次数为4(需要4条填表指令完成)，寄存器分组对应的偏移数imm分别为0、1、2、3，也即是说，第一条填表指令的偏移数imm为0(二进制00000000)，第二题填表指令的偏移数imm为1(二进制00000001)，第三条填表指令的偏移数imm为2(二进制00000010)，第四条填表指令的偏移数imm为3(二进制00000011)。For example, if VLEN is 128 bits and the value of LMUL is 8, the number of table fills is 2 (two table fill instructions are required to complete), and the offset numbers imm corresponding to the register groups are 0 and 1 respectively. That is to say, the offset number imm of the first table fill instruction is 0 (binary 00000000), and the offset number imm of the second table fill instruction is 1 (binary 00000001); for another example, if VLEN is 64 bits and the value of LMUL is also 8, the number of table fills is 4 ( 4 table filling instructions are required to complete), the offset numbers imm corresponding to the register groups are 0, 1, 2, and 3 respectively. That is to say, the offset number imm of the first table filling instruction is 0 (binary 00000000), the offset number imm of the second table filling instruction is 1 (binary 00000001), the offset number imm of the third table filling instruction is 2 (binary 00000010), and the offset number imm of the fourth table filling instruction is 3 (binary 00000011).

进一步地，计算每个寄存器分组对应的偏移数imm与第二目标值的乘积，作为每个寄存器分组在向量指令查询表中的起始填充位，该起始填充位也可以称为起始填充项，用于描述每个寄存器分组在向量指令查询表写入的起始项。其中，第二目标值为第一目标值(VLEN*最大LMUL)与存储项位数(N)的比值，也即是说，起始填充位＝imm*VLEN*最大LMUL/N。Further, the product of the offset number imm corresponding to each register group and the second target value is calculated as the starting fill bit of each register group in the vector instruction query table, and the starting fill bit can also be called the starting fill item, which is used to describe the starting item written by each register group in the vector instruction query table. Among them, the second target value is the ratio of the first target value (VLEN*maximum LMUL) to the number of storage item bits (N), that is, the starting fill bit = imm*VLEN*maximum LMUL/N.

举例而言，存储项位数N＝8，VLEN为128位，对应的2个寄存器分组对应的偏移数imm为0和1，则第一个寄存器分组的起始填充位为0*128*8/8＝0，第二个寄存器分组的起始填充位为1*128*8/8＝128。For example, the number of storage item bits N=8, VLEN is 128 bits, and the corresponding offset numbers imm of the two register groups are 0 and 1. Then the starting fill bits of the first register group are 0*128*8/8=0, and the starting fill bits of the second register group are 1*128*8/8=128.

进一步地，基于起始填充位，依次将每个寄存器分组中的目标向量寄存器存储的第一元素写入向量指令查询表。Further, based on the start fill bit, the first element stored in the target vector register in each register group is sequentially written into the vector instruction lookup table.

在实际应用中，第二类型对应的填表指令可以记为：table_fill.v vsrc imm，其中，table_fill.v表示对向量指令集中的向量填表，vsrc表示源寄存器编号，imm表示偏移数。例如，VLEN为128位，LMUL为8，偏移数imm为0和1，如果源寄存器编号vsrc分别为8号、16号向量寄存器，则两条填表指令可以记为table_fill.v v8 0和table_fill.v v161，也即是说，用table_fill.v v8 0指令将v8到v15一组8个目标向量寄存器中的第一元素a0-a127填充到向量指令查询表table的第0到第127项，用table_fill.v v16,1指令，将向量寄存器从v16到v23的一组8个寄存器中的元素a128-a255填充到向量指令查询表table的第128项到第255项。从而，本公开实施例可以利用填表规则将小于256位的寄存器中的元素写入向量指令查询表。In practical applications, the table filling instructions corresponding to the second type can be recorded as: table_fill.v vsrc imm, where table_fill.v represents the vector table filling in the vector instruction set, vsrc represents the source register number, and imm represents the offset number. For example, VLEN is 128 bits, LMUL is 8, and the offset number imm is 0 and 1. If the source register number vsrc is the 8th and 16th vector registers, respectively, the two table filling instructions can be recorded as table_fill.v v8 0 and table_fill.v v161, that is, the table_fill.v v8 0 instruction is used to fill the first element a0-a127 in a group of 8 target vector registers from v8 to v15 into the 0th to 127th items of the vector instruction query table, and the table_fill.v v16,1 instruction is used to fill the elements a128-a255 in a group of 8 registers from v16 to v23 into the 128th to 255th items of the vector instruction query table. Therefore, the embodiment of the present disclosure can use the table filling rule to write elements in a register smaller than 256 bits into the vector instruction lookup table.

需要说明的是，本公开实施例的填表过程和查表过程可以顺序执行，也可以分开执行。It should be noted that the table filling process and the table looking up process of the embodiment of the present disclosure may be performed sequentially or separately.

本公开实施例中，通过在处理器预设的存储空间配置向量指令查询表，其中，向量指令查询表的存储容量为项数与存储项位数的乘积，并确定待填表的候选向量寄存器对应的填表类型，其中，填表类型包括第一类型和第二类型，第一类型为寄存器宽度与寄存器分组最大数量的乘积大于等于存储容量，第二类型为寄存器宽度与寄存器分组最大数量的乘积小于存储容量，并利用与填表类型对应的填表规则，将多个候选向量寄存器中的目标向量寄存器存储的第一元素写入向量指令查询表，以及利用预设的查表规则，从向量指令查询表中获取待查询的第二元素，能够减少对向量寄存器空间和寄存器端口的占用，降低内存存取次数，并且利用与填表类型对应的填表规则和查表规则能够快速进行填表和查表操作，提高处理器性能。此外，本公开实施例可以利用填表规则将小于256位的寄存器中的元素写入向量指令查询表。In the embodiment of the present disclosure, a vector instruction query table is configured in a storage space preset by a processor, wherein the storage capacity of the vector instruction query table is the product of the number of items and the number of bits of the storage items, and the table filling type corresponding to the candidate vector register to be filled in the table is determined, wherein the table filling type includes a first type and a second type, wherein the first type is that the product of the register width and the maximum number of register groups is greater than or equal to the storage capacity, and the second type is that the product of the register width and the maximum number of register groups is less than the storage capacity, and the table filling rules corresponding to the table filling type are used to write the first element stored in the target vector register in multiple candidate vector registers into the vector instruction query table, and the preset table lookup rules are used to obtain the second element to be queried from the vector instruction query table, which can reduce the occupation of the vector register space and the register port, reduce the number of memory accesses, and use the table filling rules and table lookup rules corresponding to the table filling type to quickly perform table filling and table lookup operations, thereby improving processor performance. In addition, the embodiment of the present disclosure can use the table filling rules to write elements in registers less than 256 bits into the vector instruction query table.

为了实现上述实施例，本公开还提出一种处理器中向量指令填表和查表装置。In order to implement the above embodiment, the present disclosure also proposes a vector instruction table filling and table lookup device in a processor.

图7是根据本公开另一实施例提供的处理器中向量指令填表和查表装置的示意图。FIG7 is a schematic diagram of a vector instruction table filling and table lookup device in a processor according to another embodiment of the present disclosure.

如图7所示，该处理器中向量指令填表和查表装置70，包括：As shown in FIG. 7 , the vector instruction table filling and table lookup device 70 in the processor includes:

配置模块701，用于在处理器预设的存储空间配置向量指令查询表，其中，向量指令查询表的存储容量为项数与存储项位数乘积；A configuration module 701 is used to configure a vector instruction query table in a storage space preset by the processor, wherein the storage capacity of the vector instruction query table is the product of the number of items and the number of storage item bits;

确定模块702，用于确定待填表的候选向量寄存器对应的填表类型，其中，填表类型包括第一类型和第二类型，第一类型为寄存器宽度与寄存器分组最大数量乘积大于等于存储容量，第二类型为寄存器宽度与寄存器分组最大数量乘积小于存储容量；A determination module 702 is used to determine a table filling type corresponding to a candidate vector register to be filled in a table, wherein the table filling type includes a first type and a second type, the first type being that the product of the register width and the maximum number of register groups is greater than or equal to the storage capacity, and the second type being that the product of the register width and the maximum number of register groups is less than the storage capacity;

填表模块703，用于利用与填表类型对应的填表规则，将多个候选向量寄存器中的目标向量寄存器存储的第一元素写入向量指令查询表；以及A table filling module 703 is used to write the first element stored in the target vector register among the plurality of candidate vector registers into the vector instruction lookup table by using a table filling rule corresponding to the table filling type; and

查表模块704，用于利用预设的查表规则，从向量指令查询表中获取待查询的第二元素。The table lookup module 704 is used to obtain the second element to be looked up from the vector instruction lookup table using a preset table lookup rule.

在一些实施例中，在填表类型为第一类型的情况下，填表模块703，具体用于：计算存储容量与候选向量寄存器的寄存器宽度的比值，以作为寄存器分组中寄存器的第一数量；从多个候选向量寄存器中确定第一数量个寄存器作为目标向量寄存器；以及将目标向量寄存器存储的第一元素写入向量指令查询表。In some embodiments, when the table filling type is the first type, the table filling module 703 is specifically used to: calculate the ratio of the storage capacity to the register width of the candidate vector register as the first number of registers in the register group; determine a first number of registers from multiple candidate vector registers as target vector registers; and write the first element stored in the target vector register into the vector instruction query table.

在一些实施例中，填表模块703，具体用于：从多个候选向量寄存器的编号中确定源寄存器编号；以及确定从源寄存器编号开始、连续的第一数量个寄存器作为目标向量寄存器。In some embodiments, the table filling module 703 is specifically used to: determine a source register number from a plurality of candidate vector register numbers; and determine a first number of registers starting from the source register number and continuing as target vector registers.

在一些实施例中，在填表类型为第二类型的情况下，填表模块703，具体用于：计算存储容量与第一目标值的比值，以作为填表次数，其中，第一目标值为候选向量寄存器的寄存器宽度与寄存器分组最大数量乘积；从多个候选向量寄存器中确定与填表次数相同的多个寄存器分组，其中，每个寄存器分组包括第二数量个目标向量寄存器；以及依次将每个寄存器分组中的目标向量寄存器存储的第一元素写入向量指令查询表。In some embodiments, when the table filling type is the second type, the table filling module 703 is specifically used to: calculate the ratio of the storage capacity to the first target value as the number of table fillings, wherein the first target value is the product of the register width of the candidate vector register and the maximum number of register groups; determine a plurality of register groups with the same number of table fillings from a plurality of candidate vector registers, wherein each register group includes a second number of target vector registers; and write the first element stored in the target vector register in each register group into the vector instruction lookup table in turn.

在一些实施例中，填表模块703，具体用于：从多个候选向量寄存器的编号中确定与填表次数相同的多个源寄存器编号，其中，源寄存器编号的间隔至少为第二数量；确定从每个源寄存器编号开始、连续的第二数量个寄存器构成每个寄存器分组。In some embodiments, the table filling module 703 is specifically used to: determine a plurality of source register numbers with the same number of table filling times from the numbers of a plurality of candidate vector registers, wherein the interval between the source register numbers is at least a second number; and determine that a second number of consecutive registers starting from each source register number constitute each register group.

在一些实施例中，装置70还包括：第一计算模块，用于确定多个寄存器分组对应的多个偏移数，其中，多个偏移数依次为0至n的整数，n为填表次数减一；第二计算模块，用于计算每个寄存器分组对应的偏移数与第二目标值的乘积，作为每个寄存器分组在向量指令查询表中的起始填充位，其中，第二目标值为第一目标值与存储项位数的比值；并且，填表模块703，具体用于：基于起始填充位，依次将每个寄存器分组中的目标向量寄存器存储的第一元素写入向量指令查询表。In some embodiments, the device 70 also includes: a first calculation module, used to determine multiple offset numbers corresponding to multiple register groups, wherein the multiple offset numbers are integers from 0 to n in sequence, and n is the number of table filling times minus one; a second calculation module, used to calculate the product of the offset number corresponding to each register group and the second target value as the starting fill bit of each register group in the vector instruction query table, wherein the second target value is the ratio of the first target value to the number of storage item bits; and a table filling module 703, specifically used to: based on the starting fill bit, write the first element stored in the target vector register in each register group into the vector instruction query table in sequence.

在一些实施例中，查表模块704，具体用于：确定一个向量索引寄存器；根据向量指令查询表的项数确定索引位宽；计算向量索引寄存器的寄存器宽度与索引位宽的比值，作为索引数量；基于索引数量和待查询的目标索引值对向量指令查询表进行并行查表，以获取与目标索引值对应的第二元素；以及将第二元素写入预设的向量目的寄存器。In some embodiments, the table lookup module 704 is specifically used to: determine a vector index register; determine the index bit width according to the number of items in the vector instruction lookup table; calculate the ratio of the register width of the vector index register to the index bit width as the number of indexes; perform parallel lookup on the vector instruction lookup table based on the number of indexes and the target index value to be queried to obtain a second element corresponding to the target index value; and write the second element into a preset vector destination register.

在一些实施例中，在向量目的寄存器的元素宽度大于向量索引寄存器的元素宽度、且向量目的寄存器的元素宽度与存储项位数相同的情况下，查表模块，具体用于：确定多个向量目的寄存器；以及将第二元素写入多个向量目的寄存器。In some embodiments, when the element width of the vector destination register is greater than the element width of the vector index register, and the element width of the vector destination register is the same as the number of storage item bits, the table lookup module is specifically used to: determine multiple vector destination registers; and write the second element into the multiple vector destination registers.

在一些实施例中，其中，第一类型对应的填表规则的形式为：table_fill.v vsrc，第二类型对应的填表规则的形式为：table_fill.v vsrc,imm；其中，table_fill.v表示对向量指令集中的向量填表，vsrc表示源寄存器编号，imm表示偏移数。In some embodiments, the form of the table filling rule corresponding to the first type is: table_fill.v vsrc, and the form of the table filling rule corresponding to the second type is: table_fill.v vsrc,imm; wherein table_fill.v represents the table filling for vectors in the vector instruction set, vsrc represents the source register number, and imm represents the offset number.

在一些实施例中，其中，查表规则的形式为：table_find.v vdest,vidx，其中，table_find.v表示对向量指令集中的向量查表，vidx表示向量索引寄存器，vdest表示向量目的寄存器。In some embodiments, the table lookup rule is in the form of: table_find.v vdest, vidx, where table_find.v represents a vector table lookup in a vector instruction set, vidx represents a vector index register, and vdest represents a vector destination register.

根据本公开的实施例，本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to an embodiment of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

为了实现上述实施例，本公开还提出一种计算机程序产品，当计算机程序产品中的指令处理器执行时，执行如本公开前述实施例提出的处理器中向量指令填表和查表方法。In order to implement the above embodiments, the present disclosure further proposes a computer program product. When an instruction processor in the computer program product is executed, the vector instruction table filling and table lookup method in the processor proposed in the above embodiments of the present disclosure is executed.

图8示出了适于用来实现本公开实施方式的示例性电子设备的框图。图8显示的电子设备12仅仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。Fig. 8 shows a block diagram of an exemplary electronic device suitable for implementing the embodiments of the present disclosure. The electronic device 12 shown in Fig. 8 is only an example and should not bring any limitation to the functions and scope of use of the embodiments of the present disclosure.

如图8所示，电子设备12以通用计算设备的形式表现。电子设备12的组件可以包括但不限于：一个或者多个处理器16，系统存储器28，连接不同系统组件(包括系统存储器28和处理器16)的总线18。As shown in Fig. 8, the electronic device 12 is in the form of a general purpose computing device. The components of the electronic device 12 may include, but are not limited to: one or more processors 16, a system memory 28, and a bus 18 connecting different system components (including the system memory 28 and the processor 16).

总线18表示几类总线结构中的一种或多种，包括存储器总线或者存储器控制器，外围总线，图形加速端口，处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说，这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture；以下简称：ISA)总线，微通道体系结构(Micro Channel Architecture；以下简称：MAC)总线，增强型ISA总线、视频电子标准协会(Video Electronics Standards Association；以下简称：VESA)局域总线以及外围组件互连(Peripheral Component Interconnection；以下简称：PCI)总线。Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus and Peripheral Component Interconnection (PCI) bus.

电子设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备12访问的可用介质，包括易失性和非易失性介质，可移动的和不可移动的介质。The electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the electronic device 12, including volatile and non-volatile media, removable and non-removable media.

存储器28可以包括易失性存储器形式的计算机系统可读介质，例如随机存取存储器(Random Access Memory；以下简称：RAM)30和/或高速缓存存储器32。电子设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例，存储系统34可以用于读写不可移动的、非易失性磁介质(图8未显示，通常称为“硬盘驱动器”)。The memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, the storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 8 , commonly referred to as a “hard drive”).

尽管图8中未示出，可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器，以及对可移动非易失性光盘(例如：光盘只读存储器(Compact Disc Read Only Memory；以下简称：CD-ROM)、数字多功能只读光盘(Digital Video Disc Read Only Memory；以下简称：DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下，每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品，该程序产品具有一组(例如至少一个)程序模块，这些程序模块被配置以执行本公开各实施例的功能。Although not shown in FIG. 8 , a disk drive for reading and writing to a removable nonvolatile disk (e.g., a “floppy disk”) and an optical disk drive for reading and writing to a removable nonvolatile optical disk (e.g., a Compact Disc Read Only Memory (hereinafter referred to as CD-ROM), a Digital Video Disc Read Only Memory (hereinafter referred to as DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to the bus 18 via one or more data medium interfaces. The memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to perform the functions of the various embodiments of the present disclosure.

具有一组(至少一个)程序模块42的程序/实用工具40，可以存储在例如存储器28中，这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本公开所描述的实施例中的功能和/或方法。A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in the memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which or some combination may include an implementation of a network environment. The program modules 42 generally perform the functions and/or methods of the embodiments described in the present disclosure.

电子设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信，还可与一个或者多个使得用户能与该电子设备12交互的设备通信，和/或与使得该电子设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡，调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且，电子设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network；以下简称：LAN)，广域网(Wide Area Network；以下简称：WAN)和/或公共网络，例如因特网)通信。如图所示，网络适配器20通过总线18与电子设备12的其它模块通信。应当明白，尽管图中未示出，可以结合电子设备12使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理器、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with the electronic device 12, and/or communicate with any device that enables the electronic device 12 to communicate with one or more other computing devices (e.g., network card, modem, etc.). Such communication may be performed through an input/output (I/O) interface 22. In addition, the electronic device 12 may also communicate with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through a network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 through a bus 18. It should be understood that, although not shown in the figure, other hardware and/or software modules may be used in conjunction with the electronic device 12, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

处理器16通过运行存储在系统存储器28中的程序，从而执行各种功能应用，例如实现前述实施例中提及的处理器中向量指令填表和查表方法。The processor 16 executes various functional applications by running programs stored in the system memory 28, such as implementing the vector instruction table filling and table lookup methods in the processor mentioned in the above embodiments.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Those skilled in the art will readily appreciate other embodiments of the present disclosure after considering the specification and practicing the invention disclosed herein. The present disclosure is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or customary techniques in the art that are not disclosed in the present disclosure. The description and examples are to be considered exemplary only, and the true scope and spirit of the present disclosure are indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the exact structures that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

需要说明的是，在本公开的描述中，术语“第一”、“第二”等仅用于描述目的，而不能理解为指示或暗示相对重要性。此外，在本公开的描述中，除非另有说明，“多个”的含义是两个或两个以上。It should be noted that, in the description of the present disclosure, the terms "first", "second", etc. are only used for descriptive purposes and cannot be understood as indicating or implying relative importance. In addition, in the description of the present disclosure, unless otherwise specified, the meaning of "plurality" is two or more.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本公开的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本公开的实施例所属技术领域的技术人员所理解。Any process or method description in a flowchart or otherwise described herein may be understood to represent a module, segment or portion of code that includes one or more executable instructions for implementing the steps of a specific logical function or process, and the scope of the preferred embodiments of the present disclosure includes alternative implementations in which functions may not be performed in the order shown or discussed, including performing functions in a substantially simultaneous manner or in the reverse order depending on the functions involved, which should be understood by those skilled in the art to which the embodiments of the present disclosure belong.

应当理解，本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that the various parts of the present disclosure can be implemented in hardware, software, firmware or a combination thereof. In the above-mentioned embodiments, multiple steps or methods can be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: a discrete logic circuit having a logic gate circuit for implementing a logic function for a data signal, a dedicated integrated circuit having a suitable combination of logic gate circuits, a programmable gate array (PGA), a field programmable gate array (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。A person skilled in the art may understand that all or part of the steps in the method for implementing the above-mentioned embodiment may be completed by instructing related hardware through a program, and the program may be stored in a computer-readable storage medium, which, when executed, includes one or a combination of the steps of the method embodiment.

此外，在本公开各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into a processing module, or each unit may exist physically separately, or two or more units may be integrated into one module. The above-mentioned integrated module may be implemented in the form of hardware or in the form of a software functional module. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器，磁盘或光盘等。The storage medium mentioned above can be a read-only memory, a magnetic disk or an optical disk, etc.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present disclosure. In this specification, the schematic representation of the above terms does not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner.

尽管上面已经示出和描述了本公开的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本公开的限制，本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present disclosure have been shown and described above, it is to be understood that the above embodiments are illustrative and are not to be construed as limitations of the present disclosure. A person skilled in the art may change, modify, replace and vary the above embodiments within the scope of the present disclosure.

Claims

A method for filling and looking up a table of vector instructions in a processor, comprising:

A vector instruction query table is configured in a storage space preset by the processor, wherein the storage capacity of the vector instruction query table is the product of the number of items and the number of storage item bits;

Determine a table filling type corresponding to the candidate vector register to be filled, wherein the table filling type includes a first type and a second type, the first type is that the product of the register width and the maximum number of register groups is greater than or equal to the storage capacity, and the second type is that the product of the register width and the maximum number of register groups is less than the storage capacity;

Using a table filling rule corresponding to the table filling type, writing a first element stored in a target vector register among a plurality of candidate vector registers into the vector instruction lookup table;

In a case where the table filling type is the first type, using a table filling rule corresponding to the table filling type to write a first element stored in a target vector register among multiple candidate vector registers into the vector instruction lookup table includes:

Calculating a ratio of the storage capacity to a register width of the candidate vector register as a first number of registers in a register group;

Determine a source register number from the numbers of the plurality of candidate vector registers; and determine a first number of registers starting from the source register number and continuing as the target vector registers; and

The first element stored in the target vector register is written into the vector instruction lookup table.

The method of claim 1, wherein, when the table filling type is the second type, writing the first element stored in the target vector register among the plurality of candidate vector registers into the vector instruction lookup table by using a table filling rule corresponding to the table filling type comprises:

Calculating a ratio of the storage capacity to a first target value as the number of table filling times, wherein the first target value is a product of a register width of the candidate vector register and a maximum number of register groups;

Determine a plurality of register groups having the same number of table filling times from the plurality of candidate vector registers, wherein each register group includes a second number of target vector registers; and

The first element stored in the target vector register in each register group is sequentially written into the vector instruction lookup table.

The method of claim 2, wherein determining, from the plurality of candidate vector registers, a plurality of register groups having the same number of table fills comprises:

Determine, from the plurality of candidate vector register numbers, a plurality of source register numbers having the same number of table filling times, wherein the interval between the source register numbers is at least the second number;

It is determined that a second number of registers starting from each of the source register numbers constitute each register group.

The method of claim 3, further comprising:

Determine a plurality of offset numbers corresponding to a plurality of register groups, wherein the plurality of offset numbers are integers from 0 to n in sequence, and n is a value obtained by subtracting one from the number of table filling times;

Calculate the product of the offset number corresponding to each register group and a second target value as the starting fill bit of each register group in the vector instruction query table, wherein the second target value is the ratio of the first target value to the number of storage item bits;

Furthermore, the step of sequentially writing the first element stored in the target vector register in each register group into the vector instruction lookup table comprises:

Based on the start filling bit, the first element stored in the target vector register in each register group is sequentially written into the vector instruction lookup table.

The method of claim 1, further comprising:

Obtaining a second element to be queried from the vector instruction query table using a preset table query rule; and

The second element is written into a preset vector destination register.

The method of claim 5, wherein the obtaining the second element to be queried from the vector instruction query table using a preset table query rule comprises:

Identify a vector index register;

Determine the index bit width according to the number of entries in the vector instruction query table;

Calculating a ratio of a register width of the vector index register to the index bit width as the number of indexes;

The vector instruction query table is looked up in parallel based on the index quantity and the target index value to be queried to obtain the second element corresponding to the target index value.

The method of claim 6, wherein, when the element width of the vector destination register is greater than the element width of the vector index register and the element width of the vector destination register is the same as the number of bits of the storage item, writing the second element into the preset vector destination register comprises:

determining a plurality of vector destination registers; and

The second element is written to the plurality of vector destination registers.

The method according to claim 4, wherein

The form of the table filling rule corresponding to the first type is: table_fill.v vsrc, and the form of the table filling rule corresponding to the second type is: table_fill.v vsrc,imm; wherein, table_fill.v represents the table filling for vectors in the vector instruction set, vsrc represents the source register number, and imm represents the offset number.

The method according to claim 6, wherein

The table lookup rule is in the form of: table_find.v vdest, vidx, where table_find.v represents a vector table lookup in the vector instruction set, vidx represents a vector index register, and vdest represents a vector destination register.

A vector instruction table filling and table lookup device in a processor, comprising:

A configuration module, configured to configure a vector instruction query table in a storage space preset by the processor, wherein the storage capacity of the vector instruction query table is the product of the number of items and the number of storage item bits;

a determination module, configured to determine a table filling type corresponding to a candidate vector register to be filled in a table, wherein the table filling type includes a first type and a second type, the first type being that the product of the register width and the maximum number of register groups is greater than or equal to the storage capacity, and the second type being that the product of the register width and the maximum number of register groups is less than the storage capacity;

a table filling module, configured to write a first element stored in a target vector register among a plurality of candidate vector registers into the vector instruction lookup table by using a table filling rule corresponding to the table filling type;

When the form filling type is the first type,

The form filling module is used for:

Determine a first number of registers from the plurality of candidate vector registers as the target vector registers; and

Writing the first element stored in the target vector register into the vector instruction lookup table;

The form filling module is used for:

determining a source register number from the plurality of candidate vector register numbers; and

A first number of registers starting from the source register number and continuing from the source register number are determined as the target vector registers.

The apparatus of claim 10, wherein, when the form filling type is the second type, the form filling module is configured to:

The apparatus according to claim 11, wherein the form filling module is used to:

The apparatus of claim 12, further comprising:

A first calculation module is used to determine a plurality of offset numbers corresponding to a plurality of register groups, wherein the plurality of offset numbers are integers from 0 to n in sequence, and n is a value obtained by subtracting one from the number of table filling times;

A second calculation module is used to calculate the product of the offset number corresponding to each register group and a second target value as the starting fill bit of each register group in the vector instruction query table, wherein the second target value is the ratio of the first target value to the number of storage item bits;

Furthermore, the table filling module is used to: based on the start filling bit, write the first element stored in the target vector register in each register group into the vector instruction lookup table in sequence.

The apparatus of claim 10, further comprising:

The table lookup module is used to obtain the second element to be queried from the vector instruction lookup table using a preset table lookup rule; and write the second element into a preset vector destination register.

The apparatus of claim 14, wherein the table lookup module is used to:

Identify a vector index register;

The apparatus of claim 15, wherein, when the element width of the vector destination register is greater than the element width of the vector index register, and the element width of the vector destination register is the same as the number of bits of the storage item, the table lookup module is configured to:

determining a plurality of vector destination registers; and

The second element is written to the plurality of vector destination registers.

The device as claimed in claim 13, wherein

The form of the table filling rule corresponding to the first type is: table_fill.v vsrc, and the form of the table filling rule corresponding to the second type is: table_fill.v vsrc,imm; wherein table_fill.v represents the table filling for vectors in the vector instruction set, vsrc represents the source register number, and imm represents the offset number.

The device of claim 15, wherein

An electronic device, comprising:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1 to 9.

A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1-9.

A computer program product comprises a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 9 is implemented.