CN112486872B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN112486872B
CN112486872B CN202011360968.7A CN202011360968A CN112486872B CN 112486872 B CN112486872 B CN 112486872B CN 202011360968 A CN202011360968 A CN 202011360968A CN 112486872 B CN112486872 B CN 112486872B
Authority
CN
China
Prior art keywords
matrix
data
matrix calculation
calculation result
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011360968.7A
Other languages
Chinese (zh)
Other versions
CN112486872A (en
Inventor
展庆波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202011360968.7A priority Critical patent/CN112486872B/en
Publication of CN112486872A publication Critical patent/CN112486872A/en
Application granted granted Critical
Publication of CN112486872B publication Critical patent/CN112486872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1684Details of memory controller using multiple buses
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)

Abstract

本申请公开了一种数据处理方法及装置,属于通信技术领域。所述方法包括:接收针对写入内存的第一矩阵数据与第二矩阵数据的矩阵计算指令;响应于所述矩阵计算指令,调用设置于所述内存中的矩阵计算单元,根据所述矩阵计算指令的计算方式,对所述第一矩阵数据和所述第二矩阵数据进行计算,得到矩阵计算结果;根据所述矩阵计算指令对应的数据传输方式,传输所述矩阵计算结果。本申请可以降低处理器和内存之间因大量数据搬运造成的时间浪费,且提高了处理器的利用率及数据处理效率。

The present application discloses a data processing method and device, belonging to the field of communication technology. The method comprises: receiving a matrix calculation instruction for first matrix data and second matrix data written into a memory; in response to the matrix calculation instruction, calling a matrix calculation unit arranged in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction, and obtaining a matrix calculation result; transmitting the matrix calculation result according to the data transmission method corresponding to the matrix calculation instruction. The present application can reduce the time waste caused by the transfer of a large amount of data between the processor and the memory, and improve the utilization rate of the processor and the data processing efficiency.

Description

Data processing method and device
Technical Field
The application belongs to the technical field of communication, and particularly relates to a data processing method and device.
Background
The speed difference between the data processing speed and the memory interface inside the processor in the computer system is larger and larger, and when the processor processes the calculation related to the need of a large amount of data handling, the transmission of the data between the memory and the processor wastes a large amount of time, so that the processor cannot be utilized efficiently. When the processor involves a large amount of matrix operation, a large amount of data needs to be carried between the processor and the memory, and then the data is cached and recalculated in the processor.
Disclosure of Invention
The embodiment of the application aims to provide a data processing method and device, which can solve the problems that the existing data processing mode wastes data processing time, reduces the utilization rate of a processor and has lower data processing efficiency.
In order to solve the technical problems, the application is realized as follows:
In a first aspect, an embodiment of the present application provides a data processing method, including:
receiving a matrix calculation instruction aiming at first matrix data and second matrix data written into a memory;
Responding to the matrix calculation instruction, calling a matrix calculation unit arranged in the memory, and calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculation instruction to obtain a matrix calculation result;
And transmitting the matrix calculation result according to the data transmission mode corresponding to the matrix calculation instruction.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including:
The matrix calculation instruction receiving module is used for receiving a matrix calculation instruction aiming at the first matrix data and the second matrix data written into the memory;
The matrix calculation result acquisition module is used for responding to the matrix calculation instruction, calling a matrix calculation unit arranged in the memory, and calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculation instruction to obtain a matrix calculation result;
And the matrix calculation result transmission module is used for transmitting the matrix calculation result according to the data transmission mode corresponding to the matrix calculation instruction.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions implementing the steps of the data processing method according to the first aspect when executed by the processor.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the data processing method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement a data processing method according to the first aspect.
In the embodiment of the application, a matrix calculation unit arranged in a memory is called in response to a matrix calculation instruction by receiving the matrix calculation instruction aiming at the first matrix data and the second matrix data written into the memory, the first matrix data and the second matrix data are calculated according to the calculation mode of the matrix calculation instruction, a matrix calculation result is obtained, and the matrix calculation result is transmitted according to the data transmission mode corresponding to the matrix calculation instruction. According to the embodiment of the application, the matrix calculation unit is arranged in the memory in advance to perform corresponding matrix data calculation, so that a large amount of matrix operations can be rapidly completed in the memory, the time waste caused by large amount of data carrying between the processor and the memory is reduced, and the utilization rate of the processor and the data processing efficiency are improved.
Drawings
FIG. 1 is a flowchart illustrating steps of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a memory architecture according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a continuous address line memory matrix data according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a continuous address column memory matrix data according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a sequential address column output matrix data according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a sequential address line output matrix data according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of another electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The data processing method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.
Referring to fig. 1, a step flowchart of a data processing method provided by an embodiment of the present application is shown, and as shown in fig. 1, the data processing method may specifically include the following steps:
step 101: a matrix calculation instruction for the first matrix data and the second matrix data written into the memory is received.
The embodiment of the application can be applied to the scene of utilizing the matrix calculation unit in the memory to perform matrix calculation and transmitting the matrix calculation result to the processor.
The first matrix data and the second matrix data refer to matrix data which is stored in the memory in advance and needs to be subjected to matrix calculation. For example, if the matrix data stored in the memory includes matrix data a, matrix data B, and matrix data C, and dot product calculation needs to be performed on matrix data a and matrix data B, then matrix data a may be used as first matrix data, matrix data B may be used as second matrix data, or matrix data B may be used as first matrix data, and matrix data a may be used as second matrix data.
It will be appreciated that the above examples are only examples listed for better understanding of the technical solution of the embodiments of the present application, and are not to be construed as the only limitation of the present embodiments.
In this embodiment, the matrix data written into the memory may be an instruction for matrix writing for the memory, so as to ensure that the matrix read out from the nonvolatile memory can be effectively stored, and meanwhile, the calculation is convenient for MALU. The added instructions are wmc (write matrix column), wmr (write matrix row), respectively. These two instructions are now described as follows:
1、WMC(write matrix column)
The instruction is for writing matrix data read from nonvolatile memory such as nand into the memory in a continuous address array (CACM) manner. And the CA Bus of the memory sends the instruction, NOP is sent after the completion of the instruction, then the address to be written is sent, after the address is completed, DATA Bus starts to transmit matrix DATA, and the CA Bus waits for one NOP at the moment and then sends the Row size and Column size information of the matrix. The memory judges the DATA quantity on the DATA Bus according to the received Row and Column information.
All DATA write DATA on the DATA Bus are parsed and stored in the manner of the continuous address array (CACM) described above.
2、WMR(write matrix row)
The instruction is for writing matrix data read from nonvolatile storage such as nand into the memory in a continuous address line (CARM) manner.
And the CA Bus of the memory sends the instruction, NOP is sent after the completion of the instruction, then the address to be written is sent, after the address is completed, DATA Bus starts to transmit matrix DATA, and the CA Bus waits for one NOP at the moment and then sends the Row size and Column size information of the matrix. The memory judges the DATA quantity on the DATA Bus according to the received Row and Column information.
The write DATA on all DATA Bus is parsed and stored in the manner of the continuous address line (CARM) described earlier.
The matrix calculation instruction refers to an instruction issued for calculating the matrix data a and the matrix data B, and in this example, the matrix calculation instruction may be a dot product calculation instruction, an addition calculation instruction, an inner product calculation instruction, or the like, and specifically, may be determined according to a service requirement, which is not limited in this embodiment.
When the calculation of the matrix data is required, a matrix calculation instruction for the first matrix data and the second matrix data may be transmitted to the memory.
After receiving the matrix calculation instruction for the first matrix data and the second matrix data written into the memory, step 102 is performed.
Step 102: and responding to the matrix calculation instruction, calling a matrix calculation unit arranged in the memory, and calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculation instruction to obtain a matrix calculation result.
The matrix calculation unit refers to a unit that is preset in a memory and performs matrix DATA calculation, as shown in fig. 2, in order to implement the memory to perform matrix operation, a new calculation unit needs to be added to an existing memory structure (an interface basically maintains an existing design, only one Input/Output (IO) interface is added), and according to the instruction design of the present application, a matrix calculation unit (referred to as MatrixArithmetic Logic Unit, MALU in this embodiment, including a vector multiplier, a vector adder, etc.) needs to be added to the memory, as shown in fig. 2, CA Bus and DATA Bus are DATA buses.
The matrix calculation unit is dedicated to handling newly added matrix-related calculation instructions. For any matrix data a m×p and B p×n matrix inner product multiplication, it can be expressed by coefficient formula (1):
In the above formula (1), a m,~ represents the m-th row vector of the a matrix, and B ~,n represents the n-th column vector of the B matrix.
In addition, the dot multiplication of the matrix, the coefficient multiplication and the matrix addition and subtraction can be realized by common adders and inverters. The newly added vector multiplier, adder and inverter form a calculation unit MALU supporting matrix operation in the memory.
The matrix calculation result refers to a result obtained after the corresponding calculation of the first matrix data and the second matrix data.
After receiving a matrix calculation instruction for the first matrix data and the second matrix data written into the memory, the matrix calculation instruction can be responded, a matrix calculation unit arranged in the memory is called, and the first matrix data and the second matrix data are calculated according to the calculation mode of the matrix calculation instruction, so that a matrix calculation result is obtained. For example, when the calculation mode corresponding to the matrix calculation instruction is inner product calculation, inner product calculation may be performed on the first matrix data and the second matrix data to obtain an inner product calculation result, and the inner product calculation result is used as a calculation result of the two matrix data, that is, a matrix calculation result. When the calculation mode corresponding to the matrix calculation instruction is addition calculation, the first matrix data and the second matrix data can be subjected to addition calculation to obtain an addition calculation result, and the addition calculation result is used as a calculation result of the two matrix data, namely, a matrix calculation result.
It will be appreciated that the above examples are only examples listed for better understanding of the technical solution of the embodiments of the present application, and are not to be construed as the only limitation of the present embodiments.
And after the matrix calculation unit arranged in the memory is called, calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculation instruction to obtain a matrix calculation result, executing step 103.
Step 103: and transmitting the matrix calculation result according to the data transmission mode corresponding to the matrix calculation instruction.
The data transmission mode refers to a mode of performing matrix data transmission added in the matrix calculation instruction, and in this embodiment, the data transmission mode may include: a continuous address row transmission mode and a continuous address column transmission mode.
After the matrix calculation results corresponding to the first matrix data and the second matrix data are obtained, the matrix calculation results can be transmitted according to the data transmission mode corresponding to the matrix calculation instruction.
According to the embodiment of the application, the matrix calculation unit is arranged in the memory in advance to perform corresponding matrix data calculation, so that a large amount of matrix operations can be rapidly completed in the memory, the time waste caused by large amount of data carrying between the processor and the memory is reduced, and the utilization rate of the processor and the data processing efficiency are improved.
In this embodiment, a state identifier may be set outside the memory, and the running state of the matrix computing unit is determined by using the state identifier, so as to avoid the problem that the operation is not completed and the next computing request is required.
In a specific implementation manner of the present application, before the step 102, the method may further include:
step A1: and determining the running state of the matrix computing unit according to the state identifier corresponding to the matrix computing unit.
In this embodiment, the status identifier is an identifier for indicating the operation status of the matrix computing unit, in this example, in order to solve the conflict of the computing demands, a signal source MatrixBusy needs to be added outside the computable memory, and when the MatrixBusy pin is in a busy state, the computing instruction on the CA Bus about the matrix is invalid. Only when MatrixBusy pins are in an idle state, can the corresponding matrix computation instructions be responded to.
Of course, not limited to this, in a specific implementation manner, other manners may be provided as a state identifier obtaining manner of the matrix computing unit, and specifically, this embodiment may not be limited to this, depending on the service requirement.
After receiving the matrix calculation instructions of the first matrix data and the second matrix data, the running state of the matrix calculation unit can be determined according to the state identifier corresponding to the matrix calculation unit.
After determining the operation state of the matrix calculation unit, step A2 is performed or step A3 is performed.
Step A2: in case the running state is an idle state, the above-mentioned step 102 is performed.
Step A3: and caching the matrix calculation instruction under the condition that the running state is a non-idle state.
And under the condition that the running state of the matrix calculating unit is determined to be the idle state, executing the step of calling the matrix calculating unit arranged in the memory, and calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculating instruction to obtain a matrix calculating result.
And under the condition that the running state of the matrix computing unit is determined to be a non-idle state, caching the matrix computing instruction, and executing the cached matrix computing instruction after the current computing task of the matrix computing unit is completed.
According to the embodiment of the application, the running state identification of the matrix calculation unit is added in advance, so that the situation that the matrix calculation unit is busy and the system is abnormal due to excessive received calculation tasks can be avoided.
In this embodiment, the matrix calculation result may be transmitted according to a data transmission manner, and specifically, the following detailed description may be made in connection with the following specific implementation manner.
In this embodiment, the added instruction performs an inner product operation on the matrix a stored in the form of a CARM and the matrix B stored in the form of a CACM, respectively, which have already been stored in the memory (A, B matrix must be stored in the above manner, and the series of operation instructions default to A, B matrix to be stored in the above manner). For this purpose, the added instructions are respectively miprc(matrix inner product read column),miprr(matrix inner product read row),mipsc(matrix inner product store column),mipsr(matrix inner product store row).
In another specific implementation of the present application, the step 103 may include:
Substep B1: and transmitting the matrix calculation result to a command address bus according to the continuous column transmission mode and transmitting the matrix calculation result to a processor through the command address bus under the condition that the data transmission mode is the continuous column transmission mode.
In this embodiment, the continuous column transmission mode refers to a mode of performing matrix data transmission according to a continuous column mode, and the calculation mode of the matrix calculation instruction is exemplified by an inner product calculation mode, and at this time, the matrix calculation instruction is MIPRC instruction, which is sent to the computable memory through the processor, the computable memory performs matrix inner product operation on two internal matrices, and returns the calculation result to the computable memory interface according to the column information format. The instruction is issued first, then followed by the address of matrix a, the rank size of matrix a, then the address of matrix B, and the rank size of matrix B. After a delay, the calculation results are output to the DATA Bus in a continuous column mode through the processing of the MALU calculation units. When the inner product of a and B is calculated, the actual operation is performed in MALU according to the vector mode, as shown in fig. 5, since the default A, B vector storage mode of the instruction is respectively a CARM and a CACM, the MALU only needs to read row and column data of continuous addresses into the MALU unit cache, and perform vector calculation in the vector multiplier, so that the calculation result of the position can be rapidly output. The calculation is performed in the manner of the above diagram, and the calculation sequence can output the correct CACM calculation result a 11a21...amn. The result is continuously sent out to the DATA bus.
In the case that the data transmission mode is a continuous line transmission mode, the matrix calculation result may be transmitted to the command address bus in a continuous column transmission mode, and transmitted to the processor through the command address bus.
Substep B2: and transmitting the matrix calculation result to a command address bus according to the continuous line mode and transmitting the matrix calculation result to a processor through the command address bus under the condition that the data transmission mode is the continuous line transmission mode.
In this embodiment, the continuous line transmission mode refers to a mode of performing matrix data transmission according to a continuous line mode, and the calculation mode of the matrix calculation instruction is exemplified by an inner product calculation mode, and at this time, the matrix calculation instruction is MIPRR instruction, which is sent to the computable memory through the processor, the computable memory performs matrix inner product operation on two internal matrices, and returns the calculation result to the computable memory interface according to the line information format. The instruction is issued first, then followed by the address of matrix a, the rank size of matrix a, then the address of matrix B, and the rank size of matrix B. After a delay, the calculation results are output to the DATA Bus in a continuous line mode through the processing of the MALU calculation units. When the inner product of a and B is calculated, the actual operation is performed in MALU according to the vector mode, as shown in fig. 6, since the default A, B vector storage mode of the instruction is respectively a CARM and a CACM, the MALU only needs to read row and column data of continuous addresses into the MALU unit cache, and perform vector calculation in the vector multiplier, so that the calculation result of the position can be rapidly output. The calculation is performed in the manner of the above diagram, and the calculation order can output the correct CARM calculation result a 11a21...amn. The result is continuously sent out to the DATA bus.
Of course, in the present embodiment, the calculation of the matrix data may also include the dot multiplication calculation of the matrix and the addition calculation of the matrix, and specifically, may be described in detail as follows.
1. Dot product calculation of matrix
Matrix addition instructions are added to the memory to support matrix addition operations of arbitrary size. The addition of the matrices requires A, B that the matrices have the same size and the same storage.
The addition of the matrix means that two matrices with the same size are added to the elements at the corresponding positions, and the obtained matrix is the result of the addition. The addition calculation of the matrix only needs a common adder.
For this purpose, the instructions to be added are marc(matrix addition read column),marr(matrix addition read row),masc(matrix addition store column),masr(matrix addition store row). respectively, and these several instructions will now be described as follows:
a)MARC
The instruction is sent to the computable memory through the processor, the computable memory performs matrix addition operation on two internal matrices (both stored in a CACM mode, and the instruction defaults to A, B matrices for CACM storage), and the calculation result is returned to the computable memory interface according to the column information format.
The instruction is issued first, then followed by the address of matrix a, the column size of matrix a, followed by the address of matrix B (since matrix B is the same size as matrix a, no size information of matrix B is needed). After a delay, the calculation results are output to the DATA Bus in a continuous column mode through the processing of the MALU calculation units.
MALU reads in elements of the matrix A, B in sequence, and then simply adds the elements, and directly outputs the calculation results to the IO ports one by one.
b)MARR
The instruction is sent to the computable memory through the processor, the computable memory performs matrix addition operation on two internal matrices (both stored in a CARM mode, and the instruction defaults to A, B matrices for CARM storage), and the calculation result is returned to the computable memory interface according to the row information format.
The instruction is issued first, then followed by the address of matrix a, the column size of matrix a, followed by the address of matrix B (since matrix B is the same size as matrix a, no size information of matrix B is needed). After a delay, the calculation results are output to the DATA Bus in a continuous column mode through the processing of the MALU calculation units.
MALU reads in elements of the matrix A, B in sequence, and then simply adds the elements, and directly outputs the calculation results to the IO ports one by one.
c)MASC
The instruction is sent to a computable memory through a processor, the computable memory performs matrix addition operation on two internal matrices (both stored in a CACM mode, and the instruction defaults to A, B matrices for CACM storage), and the calculation result is written into a continuous space of a designated memory address according to a column information format.
The instruction is issued first, then followed by the address of matrix a, the rank size, then the address of matrix B and the rank size, and finally the address C where the result of the calculation is to be stored.
Once the instruction is valid MatrixBusy enters the busy state, matrixBusy becomes the idle state after the computational storage is complete.
d)MASR
The instruction is sent to a computable memory through a processor, the computable memory performs matrix addition operation on two internal matrices (both stored in a CARM mode, and the instruction defaults to A, B matrices for CARM storage), and the calculation result is written into a continuous space of a designated memory address according to a row information format.
The instruction is issued first, then followed by the address of matrix a, the rank size, then the address of matrix B and the rank size, and finally the address C where the result of the calculation is to be stored.
Once the instruction is valid MatrixBusy enters the busy state, matrixBusy becomes the idle state after the computational storage is complete.
According to the embodiment of the application, different matrix data calculation and transmission of matrix calculation results are realized by combining different instructions, so that transmission of calculation results in different modes can be realized, transmission of matrix calculation results is realized, and the utilization efficiency of a processor is improved.
In this embodiment, if the matrix calculation result does not need to be transmitted to the processor, the matrix calculation result may be written into the memory in a data writing manner, and specifically, the following detailed description may be made in connection with the following specific implementation manner.
In another specific implementation of the present application, after the step 102, the method may further include:
step C1: and writing the matrix calculation result into the memory according to a data writing mode corresponding to the matrix calculation instruction.
In the embodiment of the present application, the data writing mode (i.e., the data storage mode) refers to a mode for writing matrix data into the memory.
After the matrix calculation results of the first matrix data and the second matrix data are obtained, the matrix calculation results can be written into the memory according to the data writing mode of the matrix calculation instruction, namely, the condition that whether the matrix calculation results are written into the memory is included in the matrix calculation instruction, and when the condition that the memory is written into the matrix calculation instruction is included in the matrix calculation instruction, namely, the data writing mode is included in the matrix calculation instruction, after the matrix calculation results of the first matrix data and the second matrix data are obtained, the matrix calculation results can be written into the memory according to the data writing mode.
In this embodiment, the data writing manner may include a sequential column writing manner and a sequential row writing manner, and these two data writing manners will be described in detail in connection with the following specific implementation manner.
In this embodiment, the matrix of the present invention has two storage modes in the memory, namely, two modes of storage according to continuous address rows and storage according to continuous address columns.
In another specific implementation of the present application, the step C1 may include:
Substep D1: and writing the matrix calculation result into the memory according to the continuous column writing mode when the data writing mode is the continuous column writing mode.
In this embodiment, the sequential column writing mode refers to a mode of writing matrix data into the memory in sequential columns.
After the data writing manner included in the matrix calculation instruction is a continuous column writing manner and the matrix calculation results of the first matrix data and the second matrix data are obtained, the matrix calculation results may be written into the memory according to the continuous column writing manner, for example, as shown in fig. 4, all matrix elements are stored on continuous addresses according to a column traversing manner. The data stored in the memory unit is not different from the data stored in the traditional memory, and the read instruction of the traditional memory is used for reading random data, and the random data is operated according to the instruction of the invention, namely matrix data.
Substep D2: and writing the matrix calculation result into the memory according to the continuous line writing mode when the data writing mode is the continuous line writing mode.
In this embodiment, the continuous line writing mode refers to a mode of writing matrix data into the memory in a continuous line manner.
After the data writing manner included in the matrix calculation instruction is a continuous line writing manner and the matrix calculation results of the first matrix data and the second matrix data are obtained, the matrix calculation results may be written into the memory according to the continuous line writing manner, for example, as shown in fig. 4, all matrix elements are stored on continuous addresses according to a line traversing manner. The data stored in the memory unit is not different from the data stored in the traditional memory, and the read instruction of the traditional memory is used for reading random data, and the random data is operated according to the instruction of the invention, namely matrix data.
In this embodiment, in order to fully utilize the matrix computing capability of the computable memory of the present invention, a compiler layer is required to be modified to convert the computation related to the matrix into the above instruction, and at the same time, a corresponding coding instruction is required for the portion of the matrix computing to ensure the full utilization of the memory computing resource.
According to the data processing method provided by the embodiment of the application, the matrix calculation instruction for the first matrix data and the second matrix data written into the memory is received, the matrix calculation unit arranged in the memory is called in response to the matrix calculation instruction, the first matrix data and the second matrix data are calculated according to the calculation mode of the matrix calculation instruction, the matrix calculation result is obtained, and the matrix calculation result is transmitted according to the data transmission mode corresponding to the matrix calculation instruction. According to the embodiment of the application, the matrix calculation unit is arranged in the memory in advance to perform corresponding matrix data calculation, so that a large amount of matrix operations can be rapidly completed in the memory, the time waste caused by large amount of data carrying between the processor and the memory is reduced, and the utilization rate of the processor and the data processing efficiency are improved.
It should be noted that, in the data processing method provided in the embodiment of the present application, the execution body may be a data processing apparatus, or a control module in the data processing apparatus for executing the data processing method. In the embodiment of the present application, a data processing device is described by taking a data processing method performed by the data processing device as an example.
Referring to fig. 7, a schematic structural diagram of a data processing apparatus according to an embodiment of the present application is shown, and as shown in fig. 7, the data processing apparatus 700 may specifically include the following modules:
a matrix calculation instruction receiving module 710, configured to receive a matrix calculation instruction for the first matrix data and the second matrix data written into the memory;
The matrix calculation result obtaining module 720 is configured to respond to the matrix calculation instruction, call a matrix calculation unit disposed in the memory, and calculate the first matrix data and the second matrix data according to a calculation mode of the matrix calculation instruction, so as to obtain a matrix calculation result;
And a matrix calculation result transmission module 730, configured to transmit the matrix calculation result according to a data transmission mode corresponding to the matrix calculation instruction.
Optionally, the method further comprises:
The running state determining module is used for determining the running state of the matrix computing unit according to the state identifier corresponding to the matrix computing unit;
the calculation result execution module is used for executing the matrix calculation result acquisition module under the condition that the running state is an idle state;
and the matrix calculation instruction caching module is used for caching the matrix calculation instruction under the condition that the running state is a non-idle state.
Optionally, the matrix calculation result transmission module 730 includes:
The first calculation result transmission unit is used for transmitting the matrix calculation result to a command address bus according to the continuous column transmission mode and transmitting the matrix calculation result to a processor through the command address bus when the data transmission mode is the continuous column transmission mode;
And the second calculation result transmission unit is used for transmitting the matrix calculation result to a command address bus according to the continuous line mode and transmitting the matrix calculation result to a processor through the command address bus when the data transmission mode is the continuous line transmission mode.
Optionally, the method further comprises:
and the calculation result writing module is used for writing the matrix calculation result into the memory according to a data writing mode corresponding to the matrix calculation instruction.
Optionally, the calculation result writing module includes:
a first calculation result writing unit, configured to write, in the memory, the matrix calculation result according to the continuous column writing mode when the data writing mode is the continuous column writing mode;
And the second calculation result writing unit is used for writing the matrix calculation result into the memory according to the continuous line writing mode when the data writing mode is the continuous line writing mode.
According to the data processing device provided by the embodiment of the application, the matrix calculation instruction for the first matrix data and the second matrix data written into the memory is received, the matrix calculation unit arranged in the memory is called in response to the matrix calculation instruction, the first matrix data and the second matrix data are calculated according to the calculation mode of the matrix calculation instruction, the matrix calculation result is obtained, and the matrix calculation result is transmitted according to the data transmission mode corresponding to the matrix calculation instruction. According to the embodiment of the application, the matrix calculation unit is arranged in the memory in advance to perform corresponding matrix data calculation, so that a large amount of matrix operations can be rapidly completed in the memory, the time waste caused by large amount of data carrying between the processor and the memory is reduced, and the utilization rate of the processor and the data processing efficiency are improved.
The data processing device in the embodiment of the application can be a device, or can be a component, an integrated circuit, or a chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., and the embodiments of the present application are not limited in particular.
The data processing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.
The data processing device provided in the embodiment of the present application can implement each process implemented by the method embodiment of fig. 1, and in order to avoid repetition, a description is omitted here.
Optionally, as shown in fig. 8, an electronic device 800 according to an embodiment of the present application further includes a processor 801, a memory 802, and a program or an instruction stored in the memory 802 and capable of running on the processor 801, where the program or the instruction implements the above embodiment of the data processing method when being executed by the processor 801, and the program or the instruction implements each process of the above embodiment of the data processing method when being executed by the processor, and the same technical effects are achieved, so that repetition is avoided and no further description is given here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the data processing method embodiment, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (8)

1.一种数据处理方法,其特征在于,包括:1. A data processing method, comprising: 接收针对写入内存的第一矩阵数据与第二矩阵数据的矩阵计算指令;响应于所述矩阵计算指令,调用设置于所述内存中的矩阵计算单元,根据所述矩阵计算指令的计算方式,对所述第一矩阵数据和所述第二矩阵数据进行计算,得到矩阵计算结果;Receiving a matrix calculation instruction for first matrix data and second matrix data written into a memory; in response to the matrix calculation instruction, calling a matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to a calculation method of the matrix calculation instruction, and obtaining a matrix calculation result; 在数据传输方式为连续列传输方式的情况下,将所述矩阵计算结果按照所述连续列传输方式传输至命令地址总线,并通过所述命令地址总线将所述矩阵计算结果传输至处理器;When the data transmission mode is a continuous column transmission mode, the matrix calculation result is transmitted to a command address bus according to the continuous column transmission mode, and the matrix calculation result is transmitted to a processor through the command address bus; 在所述数据传输方式为连续行传输方式的情况下,将所述矩阵计算结果按照所述连续行传输方式传输至命令地址总线,并通过所述命令地址总线将所述矩阵计算结果传输至处理器。In the case where the data transmission mode is a continuous row transmission mode, the matrix calculation result is transmitted to a command address bus according to the continuous row transmission mode, and the matrix calculation result is transmitted to a processor through the command address bus. 2.根据权利要求1所述的方法,其特征在于,在所述调用设置于所述内存中的矩阵计算单元,根据所述矩阵计算指令的计算方式,对所述第一矩阵数据和所述第二矩阵数据进行计算,得到矩阵计算结果之前,还包括:2. The method according to claim 1, characterized in that before the calling of the matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction to obtain the matrix calculation result, it also includes: 根据所述矩阵计算单元对应的状态标识,确定所述矩阵计算单元的运行状态;Determining the operating state of the matrix calculation unit according to the state identifier corresponding to the matrix calculation unit; 在所述运行状态为空闲状态的情况下,执行所述调用设置于所述内存中的矩阵计算单元,根据所述矩阵计算指令的计算方式,对所述第一矩阵数据和所述第二矩阵数据进行计算,得到矩阵计算结果的步骤;在所述运行状态为非空闲状态的情况下,缓存所述矩阵计算指令。When the running state is an idle state, the step of calling the matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction, and obtaining the matrix calculation result is executed; when the running state is a non-idle state, the matrix calculation instruction is cached. 3.根据权利要求1所述的方法,其特征在于,在所述调用设置于所述内存中的矩阵计算单元,根据所述矩阵计算指令的计算方式,对所述第一矩阵数据和所述第二矩阵数据进行计算,得到矩阵计算结果之后,还包括:3. The method according to claim 1, characterized in that after the calling of the matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction to obtain the matrix calculation result, it also includes: 根据所述矩阵计算指令对应的数据写入方式,将所述矩阵计算结果写入所述内存。The matrix calculation result is written into the memory according to the data writing method corresponding to the matrix calculation instruction. 4.根据权利要求3所述的方法,其特征在于,所述根据所述矩阵计算指令对应的数据写入方式,将所述矩阵计算结果写入所述内存,包括:4. The method according to claim 3, characterized in that the step of writing the matrix calculation result into the memory according to the data writing mode corresponding to the matrix calculation instruction comprises: 在所述数据写入方式为连续列写入方式的情况下,将所述矩阵计算结果按照所述连续列写入方式写入所述内存;When the data writing mode is a continuous column writing mode, writing the matrix calculation result into the memory according to the continuous column writing mode; 在所述数据写入方式为连续行写入方式的情况下,将所述矩阵计算结果按照所述连续行写入方式写入所述内存。When the data writing mode is a continuous row writing mode, the matrix calculation result is written into the memory according to the continuous row writing mode. 5.一种数据处理装置,其特征在于,包括:5. A data processing device, comprising: 矩阵计算指令接收模块,用于接收针对写入内存的第一矩阵数据与第二矩阵数据的矩阵计算指令;A matrix calculation instruction receiving module, used for receiving a matrix calculation instruction for the first matrix data and the second matrix data written into the memory; 矩阵计算结果获取模块,用于响应于所述矩阵计算指令,调用设置于所述内存中的矩阵计算单元,根据所述矩阵计算指令的计算方式,对所述第一矩阵数据和所述第二矩阵数据进行计算,得到矩阵计算结果;第一计算结果传输单元,用于在数据传输方式为连续列传输方式的情况下,将所述矩阵计算结果按照所述连续列传输方式传输至命令地址总线,并通过所述命令地址总线将所述矩阵计算结果传输至处理器;第二计算结果传输单元,用于在所述数据传输方式为连续行传输方式的情况下,将所述矩阵计算结果按照所述连续行传输方式传输至命令地址总线,并通过所述命令地址总线将所述矩阵计算结果传输至处理器。a matrix calculation result acquisition module, for responding to the matrix calculation instruction, calling the matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction, and obtaining the matrix calculation result; a first calculation result transmission unit, for transmitting the matrix calculation result to the command address bus according to the continuous column transmission mode when the data transmission mode is the continuous column transmission mode, and transmitting the matrix calculation result to the processor through the command address bus; a second calculation result transmission unit, for transmitting the matrix calculation result to the command address bus according to the continuous row transmission mode when the data transmission mode is the continuous row transmission mode, and transmitting the matrix calculation result to the processor through the command address bus. 6.根据权利要求5所述的装置,其特征在于,还包括:6. The device according to claim 5, further comprising: 运行状态确定模块,用于根据所述矩阵计算单元对应的状态标识,确定所述矩阵计算单元的运行状态;An operation status determination module, used to determine the operation status of the matrix calculation unit according to the status identifier corresponding to the matrix calculation unit; 计算结果执行模块,用于在所述运行状态为空闲状态的情况下,执行所述矩阵计算结果获取模块;A calculation result execution module, used for executing the matrix calculation result acquisition module when the running state is an idle state; 矩阵计算指令缓存模块,用于在所述运行状态为非空闲状态的情况下,缓存所述矩阵计算指令。The matrix calculation instruction cache module is used to cache the matrix calculation instructions when the running state is a non-idle state. 7.根据权利要求5所述的装置,其特征在于,还包括:7. The device according to claim 5, further comprising: 计算结果写入模块,用于根据所述矩阵计算指令对应的数据写入方式,将所述矩阵计算结果写入所述内存。A calculation result writing module is used to write the matrix calculation result into the memory according to the data writing method corresponding to the matrix calculation instruction. 8.根据权利要求7所述的装置,其特征在于,所述计算结果写入模块包括:8. The device according to claim 7, characterized in that the calculation result writing module comprises: 第一计算结果写入单元,用于在所述数据写入方式为连续列写入方式的情况下,将所述矩阵计算结果按照所述连续列写入方式写入所述内存;A first calculation result writing unit, configured to write the matrix calculation result into the memory in a continuous column writing mode when the data writing mode is a continuous column writing mode; 第二计算结果写入单元,用于在所述数据写入方式为连续行写入方式的情况下,将所述矩阵计算结果按照所述连续行写入方式写入所述内存。The second calculation result writing unit is used to write the matrix calculation result into the memory in accordance with the continuous row writing mode when the data writing mode is the continuous row writing mode.
CN202011360968.7A 2020-11-27 2020-11-27 Data processing method and device Active CN112486872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011360968.7A CN112486872B (en) 2020-11-27 2020-11-27 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011360968.7A CN112486872B (en) 2020-11-27 2020-11-27 Data processing method and device

Publications (2)

Publication Number Publication Date
CN112486872A CN112486872A (en) 2021-03-12
CN112486872B true CN112486872B (en) 2024-07-19

Family

ID=74936520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011360968.7A Active CN112486872B (en) 2020-11-27 2020-11-27 Data processing method and device

Country Status (1)

Country Link
CN (1) CN112486872B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143766A (en) * 2019-12-24 2020-05-12 上海寒武纪信息科技有限公司 Method and apparatus for processing two-dimensional complex matrix by artificial intelligence processor

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2573888B1 (en) * 1984-11-23 1987-01-16 Sintra SYSTEM FOR THE SIMULTANEOUS TRANSMISSION OF DATA BLOCKS OR VECTORS BETWEEN A MEMORY AND ONE OR MORE DATA PROCESSING UNITS
SU1695319A1 (en) * 1989-09-25 1991-11-30 Физико-механический институт им.Г.В.Карпенко Matrix computing device
JP4448917B2 (en) * 1993-09-17 2010-04-14 株式会社ルネサステクノロジ Semiconductor integrated circuit device, data processing device, and microcomputer
CN108845828B (en) * 2018-05-29 2021-01-08 深圳市国微电子有限公司 Coprocessor, matrix operation acceleration method and system
CN109710213A (en) * 2018-12-25 2019-05-03 广东浪潮大数据研究有限公司 A kind of sparse matrix accelerates to calculate method, apparatus, equipment and its system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143766A (en) * 2019-12-24 2020-05-12 上海寒武纪信息科技有限公司 Method and apparatus for processing two-dimensional complex matrix by artificial intelligence processor

Also Published As

Publication number Publication date
CN112486872A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
CN110647480B (en) Data processing methods, remote direct access network cards and equipment
CN111857820B (en) A device and method for performing matrix addition/subtraction operations
US11880684B2 (en) RISC-V-based artificial intelligence inference method and system
CN111258935B (en) Data transmission device and method
CN110688157A (en) Computing device and computing method
CN112214726A (en) Operation accelerator
CN111104164A (en) Apparatus and method for performing matrix multiplication operation
JPH0562387B2 (en)
US11093245B2 (en) Computer system and memory access technology
US6789183B1 (en) Apparatus and method for activation of a digital signal processor in an idle mode for interprocessor transfer of signal groups in a digital signal processing unit
CN116578245B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
CN108108190B (en) Calculation method and related product
CN111813721B (en) Neural network data processing method, device, equipment and storage medium
US3710349A (en) Data transferring circuit arrangement for transferring data between memories of a computer system
US10127040B2 (en) Processor and method for executing memory access and computing instructions for host matrix operations
CN111143766A (en) Method and apparatus for processing two-dimensional complex matrix by artificial intelligence processor
CN116521096B (en) Memory access circuit and memory access method, integrated circuit and electronic device
WO2022007597A1 (en) Matrix operation method and accelerator
CN112486872B (en) Data processing method and device
CN109471612B (en) Arithmetic device and method
CN120011263A (en) Data sharing method, device and electronic equipment
CN116931876A (en) Matrix operation system, matrix operation method, satellite navigation method and storage medium
CN119201001A (en) Data processing method and related device
CN112328208A (en) Arithmetic device and method
EP3910483B1 (en) Cache management method, cache management system, and information processing apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant