CN112486872B

CN112486872B - Data processing method and device

Info

Publication number: CN112486872B
Application number: CN202011360968.7A
Authority: CN
Inventors: 展庆波
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2024-07-19
Anticipated expiration: 2040-11-27
Also published as: CN112486872A

Abstract

The present application discloses a data processing method and device, belonging to the field of communication technology. The method comprises: receiving a matrix calculation instruction for first matrix data and second matrix data written into a memory; in response to the matrix calculation instruction, calling a matrix calculation unit arranged in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction, and obtaining a matrix calculation result; transmitting the matrix calculation result according to the data transmission method corresponding to the matrix calculation instruction. The present application can reduce the time waste caused by the transfer of a large amount of data between the processor and the memory, and improve the utilization rate of the processor and the data processing efficiency.

Description

Data processing method and device

Technical Field

The application belongs to the technical field of communication, and particularly relates to a data processing method and device.

Background

The speed difference between the data processing speed and the memory interface inside the processor in the computer system is larger and larger, and when the processor processes the calculation related to the need of a large amount of data handling, the transmission of the data between the memory and the processor wastes a large amount of time, so that the processor cannot be utilized efficiently. When the processor involves a large amount of matrix operation, a large amount of data needs to be carried between the processor and the memory, and then the data is cached and recalculated in the processor.

Disclosure of Invention

The embodiment of the application aims to provide a data processing method and device, which can solve the problems that the existing data processing mode wastes data processing time, reduces the utilization rate of a processor and has lower data processing efficiency.

In order to solve the technical problems, the application is realized as follows:

In a first aspect, an embodiment of the present application provides a data processing method, including:

receiving a matrix calculation instruction aiming at first matrix data and second matrix data written into a memory;

Responding to the matrix calculation instruction, calling a matrix calculation unit arranged in the memory, and calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculation instruction to obtain a matrix calculation result;

And transmitting the matrix calculation result according to the data transmission mode corresponding to the matrix calculation instruction.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including:

The matrix calculation instruction receiving module is used for receiving a matrix calculation instruction aiming at the first matrix data and the second matrix data written into the memory;

The matrix calculation result acquisition module is used for responding to the matrix calculation instruction, calling a matrix calculation unit arranged in the memory, and calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculation instruction to obtain a matrix calculation result;

And the matrix calculation result transmission module is used for transmitting the matrix calculation result according to the data transmission mode corresponding to the matrix calculation instruction.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions implementing the steps of the data processing method according to the first aspect when executed by the processor.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the data processing method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement a data processing method according to the first aspect.

In the embodiment of the application, a matrix calculation unit arranged in a memory is called in response to a matrix calculation instruction by receiving the matrix calculation instruction aiming at the first matrix data and the second matrix data written into the memory, the first matrix data and the second matrix data are calculated according to the calculation mode of the matrix calculation instruction, a matrix calculation result is obtained, and the matrix calculation result is transmitted according to the data transmission mode corresponding to the matrix calculation instruction. According to the embodiment of the application, the matrix calculation unit is arranged in the memory in advance to perform corresponding matrix data calculation, so that a large amount of matrix operations can be rapidly completed in the memory, the time waste caused by large amount of data carrying between the processor and the memory is reduced, and the utilization rate of the processor and the data processing efficiency are improved.

Drawings

FIG. 1 is a flowchart illustrating steps of a data processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a memory architecture according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a continuous address line memory matrix data according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a continuous address column memory matrix data according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a sequential address column output matrix data according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a sequential address line output matrix data according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of another electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The data processing method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Referring to fig. 1, a step flowchart of a data processing method provided by an embodiment of the present application is shown, and as shown in fig. 1, the data processing method may specifically include the following steps:

step 101: a matrix calculation instruction for the first matrix data and the second matrix data written into the memory is received.

The embodiment of the application can be applied to the scene of utilizing the matrix calculation unit in the memory to perform matrix calculation and transmitting the matrix calculation result to the processor.

The first matrix data and the second matrix data refer to matrix data which is stored in the memory in advance and needs to be subjected to matrix calculation. For example, if the matrix data stored in the memory includes matrix data a, matrix data B, and matrix data C, and dot product calculation needs to be performed on matrix data a and matrix data B, then matrix data a may be used as first matrix data, matrix data B may be used as second matrix data, or matrix data B may be used as first matrix data, and matrix data a may be used as second matrix data.

It will be appreciated that the above examples are only examples listed for better understanding of the technical solution of the embodiments of the present application, and are not to be construed as the only limitation of the present embodiments.

In this embodiment, the matrix data written into the memory may be an instruction for matrix writing for the memory, so as to ensure that the matrix read out from the nonvolatile memory can be effectively stored, and meanwhile, the calculation is convenient for MALU. The added instructions are wmc (write matrix column), wmr (write matrix row), respectively. These two instructions are now described as follows:

1、WMC(write matrix column)

The instruction is for writing matrix data read from nonvolatile memory such as nand into the memory in a continuous address array (CACM) manner. And the CA Bus of the memory sends the instruction, NOP is sent after the completion of the instruction, then the address to be written is sent, after the address is completed, DATA Bus starts to transmit matrix DATA, and the CA Bus waits for one NOP at the moment and then sends the Row size and Column size information of the matrix. The memory judges the DATA quantity on the DATA Bus according to the received Row and Column information.

All DATA write DATA on the DATA Bus are parsed and stored in the manner of the continuous address array (CACM) described above.

2、WMR(write matrix row)

The instruction is for writing matrix data read from nonvolatile storage such as nand into the memory in a continuous address line (CARM) manner.

And the CA Bus of the memory sends the instruction, NOP is sent after the completion of the instruction, then the address to be written is sent, after the address is completed, DATA Bus starts to transmit matrix DATA, and the CA Bus waits for one NOP at the moment and then sends the Row size and Column size information of the matrix. The memory judges the DATA quantity on the DATA Bus according to the received Row and Column information.

The write DATA on all DATA Bus is parsed and stored in the manner of the continuous address line (CARM) described earlier.

The matrix calculation instruction refers to an instruction issued for calculating the matrix data a and the matrix data B, and in this example, the matrix calculation instruction may be a dot product calculation instruction, an addition calculation instruction, an inner product calculation instruction, or the like, and specifically, may be determined according to a service requirement, which is not limited in this embodiment.

When the calculation of the matrix data is required, a matrix calculation instruction for the first matrix data and the second matrix data may be transmitted to the memory.

After receiving the matrix calculation instruction for the first matrix data and the second matrix data written into the memory, step 102 is performed.

Step 102: and responding to the matrix calculation instruction, calling a matrix calculation unit arranged in the memory, and calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculation instruction to obtain a matrix calculation result.

The matrix calculation unit refers to a unit that is preset in a memory and performs matrix DATA calculation, as shown in fig. 2, in order to implement the memory to perform matrix operation, a new calculation unit needs to be added to an existing memory structure (an interface basically maintains an existing design, only one Input/Output (IO) interface is added), and according to the instruction design of the present application, a matrix calculation unit (referred to as MatrixArithmetic Logic Unit, MALU in this embodiment, including a vector multiplier, a vector adder, etc.) needs to be added to the memory, as shown in fig. 2, CA Bus and DATA Bus are DATA buses.

The matrix calculation unit is dedicated to handling newly added matrix-related calculation instructions. For any matrix data a _m×p and B _p×n matrix inner product multiplication, it can be expressed by coefficient formula (1):

In the above formula (1), a _m,～ represents the m-th row vector of the a matrix, and B _～,n represents the n-th column vector of the B matrix.

In addition, the dot multiplication of the matrix, the coefficient multiplication and the matrix addition and subtraction can be realized by common adders and inverters. The newly added vector multiplier, adder and inverter form a calculation unit MALU supporting matrix operation in the memory.

The matrix calculation result refers to a result obtained after the corresponding calculation of the first matrix data and the second matrix data.

After receiving a matrix calculation instruction for the first matrix data and the second matrix data written into the memory, the matrix calculation instruction can be responded, a matrix calculation unit arranged in the memory is called, and the first matrix data and the second matrix data are calculated according to the calculation mode of the matrix calculation instruction, so that a matrix calculation result is obtained. For example, when the calculation mode corresponding to the matrix calculation instruction is inner product calculation, inner product calculation may be performed on the first matrix data and the second matrix data to obtain an inner product calculation result, and the inner product calculation result is used as a calculation result of the two matrix data, that is, a matrix calculation result. When the calculation mode corresponding to the matrix calculation instruction is addition calculation, the first matrix data and the second matrix data can be subjected to addition calculation to obtain an addition calculation result, and the addition calculation result is used as a calculation result of the two matrix data, namely, a matrix calculation result.

And after the matrix calculation unit arranged in the memory is called, calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculation instruction to obtain a matrix calculation result, executing step 103.

Step 103: and transmitting the matrix calculation result according to the data transmission mode corresponding to the matrix calculation instruction.

The data transmission mode refers to a mode of performing matrix data transmission added in the matrix calculation instruction, and in this embodiment, the data transmission mode may include: a continuous address row transmission mode and a continuous address column transmission mode.

After the matrix calculation results corresponding to the first matrix data and the second matrix data are obtained, the matrix calculation results can be transmitted according to the data transmission mode corresponding to the matrix calculation instruction.

According to the embodiment of the application, the matrix calculation unit is arranged in the memory in advance to perform corresponding matrix data calculation, so that a large amount of matrix operations can be rapidly completed in the memory, the time waste caused by large amount of data carrying between the processor and the memory is reduced, and the utilization rate of the processor and the data processing efficiency are improved.

In this embodiment, a state identifier may be set outside the memory, and the running state of the matrix computing unit is determined by using the state identifier, so as to avoid the problem that the operation is not completed and the next computing request is required.

In a specific implementation manner of the present application, before the step 102, the method may further include:

step A1: and determining the running state of the matrix computing unit according to the state identifier corresponding to the matrix computing unit.

In this embodiment, the status identifier is an identifier for indicating the operation status of the matrix computing unit, in this example, in order to solve the conflict of the computing demands, a signal source MatrixBusy needs to be added outside the computable memory, and when the MatrixBusy pin is in a busy state, the computing instruction on the CA Bus about the matrix is invalid. Only when MatrixBusy pins are in an idle state, can the corresponding matrix computation instructions be responded to.

Of course, not limited to this, in a specific implementation manner, other manners may be provided as a state identifier obtaining manner of the matrix computing unit, and specifically, this embodiment may not be limited to this, depending on the service requirement.

After receiving the matrix calculation instructions of the first matrix data and the second matrix data, the running state of the matrix calculation unit can be determined according to the state identifier corresponding to the matrix calculation unit.

After determining the operation state of the matrix calculation unit, step A2 is performed or step A3 is performed.

Step A2: in case the running state is an idle state, the above-mentioned step 102 is performed.

Step A3: and caching the matrix calculation instruction under the condition that the running state is a non-idle state.

And under the condition that the running state of the matrix calculating unit is determined to be the idle state, executing the step of calling the matrix calculating unit arranged in the memory, and calculating the first matrix data and the second matrix data according to the calculation mode of the matrix calculating instruction to obtain a matrix calculating result.

And under the condition that the running state of the matrix computing unit is determined to be a non-idle state, caching the matrix computing instruction, and executing the cached matrix computing instruction after the current computing task of the matrix computing unit is completed.

According to the embodiment of the application, the running state identification of the matrix calculation unit is added in advance, so that the situation that the matrix calculation unit is busy and the system is abnormal due to excessive received calculation tasks can be avoided.

In this embodiment, the matrix calculation result may be transmitted according to a data transmission manner, and specifically, the following detailed description may be made in connection with the following specific implementation manner.

In this embodiment, the added instruction performs an inner product operation on the matrix a stored in the form of a CARM and the matrix B stored in the form of a CACM, respectively, which have already been stored in the memory (A, B matrix must be stored in the above manner, and the series of operation instructions default to A, B matrix to be stored in the above manner). For this purpose, the added instructions are respectively miprc(matrix inner product read column),miprr(matrix inner product read row),mipsc(matrix inner product store column),mipsr(matrix inner product store row).

In another specific implementation of the present application, the step 103 may include:

Substep B1: and transmitting the matrix calculation result to a command address bus according to the continuous column transmission mode and transmitting the matrix calculation result to a processor through the command address bus under the condition that the data transmission mode is the continuous column transmission mode.

In this embodiment, the continuous column transmission mode refers to a mode of performing matrix data transmission according to a continuous column mode, and the calculation mode of the matrix calculation instruction is exemplified by an inner product calculation mode, and at this time, the matrix calculation instruction is MIPRC instruction, which is sent to the computable memory through the processor, the computable memory performs matrix inner product operation on two internal matrices, and returns the calculation result to the computable memory interface according to the column information format. The instruction is issued first, then followed by the address of matrix a, the rank size of matrix a, then the address of matrix B, and the rank size of matrix B. After a delay, the calculation results are output to the DATA Bus in a continuous column mode through the processing of the MALU calculation units. When the inner product of a and B is calculated, the actual operation is performed in MALU according to the vector mode, as shown in fig. 5, since the default A, B vector storage mode of the instruction is respectively a CARM and a CACM, the MALU only needs to read row and column data of continuous addresses into the MALU unit cache, and perform vector calculation in the vector multiplier, so that the calculation result of the position can be rapidly output. The calculation is performed in the manner of the above diagram, and the calculation sequence can output the correct CACM calculation result a ₁₁a₂₁...a_mn. The result is continuously sent out to the DATA bus.

In the case that the data transmission mode is a continuous line transmission mode, the matrix calculation result may be transmitted to the command address bus in a continuous column transmission mode, and transmitted to the processor through the command address bus.

Substep B2: and transmitting the matrix calculation result to a command address bus according to the continuous line mode and transmitting the matrix calculation result to a processor through the command address bus under the condition that the data transmission mode is the continuous line transmission mode.

In this embodiment, the continuous line transmission mode refers to a mode of performing matrix data transmission according to a continuous line mode, and the calculation mode of the matrix calculation instruction is exemplified by an inner product calculation mode, and at this time, the matrix calculation instruction is MIPRR instruction, which is sent to the computable memory through the processor, the computable memory performs matrix inner product operation on two internal matrices, and returns the calculation result to the computable memory interface according to the line information format. The instruction is issued first, then followed by the address of matrix a, the rank size of matrix a, then the address of matrix B, and the rank size of matrix B. After a delay, the calculation results are output to the DATA Bus in a continuous line mode through the processing of the MALU calculation units. When the inner product of a and B is calculated, the actual operation is performed in MALU according to the vector mode, as shown in fig. 6, since the default A, B vector storage mode of the instruction is respectively a CARM and a CACM, the MALU only needs to read row and column data of continuous addresses into the MALU unit cache, and perform vector calculation in the vector multiplier, so that the calculation result of the position can be rapidly output. The calculation is performed in the manner of the above diagram, and the calculation order can output the correct CARM calculation result a ₁₁a₂₁...a_mn. The result is continuously sent out to the DATA bus.

Of course, in the present embodiment, the calculation of the matrix data may also include the dot multiplication calculation of the matrix and the addition calculation of the matrix, and specifically, may be described in detail as follows.

1. Dot product calculation of matrix

Matrix addition instructions are added to the memory to support matrix addition operations of arbitrary size. The addition of the matrices requires A, B that the matrices have the same size and the same storage.

The addition of the matrix means that two matrices with the same size are added to the elements at the corresponding positions, and the obtained matrix is the result of the addition. The addition calculation of the matrix only needs a common adder.

For this purpose, the instructions to be added are marc(matrix addition read column),marr(matrix addition read row),masc(matrix addition store column),masr(matrix addition store row). respectively, and these several instructions will now be described as follows:

a)MARC

The instruction is sent to the computable memory through the processor, the computable memory performs matrix addition operation on two internal matrices (both stored in a CACM mode, and the instruction defaults to A, B matrices for CACM storage), and the calculation result is returned to the computable memory interface according to the column information format.

The instruction is issued first, then followed by the address of matrix a, the column size of matrix a, followed by the address of matrix B (since matrix B is the same size as matrix a, no size information of matrix B is needed). After a delay, the calculation results are output to the DATA Bus in a continuous column mode through the processing of the MALU calculation units.

MALU reads in elements of the matrix A, B in sequence, and then simply adds the elements, and directly outputs the calculation results to the IO ports one by one.

b)MARR

The instruction is sent to the computable memory through the processor, the computable memory performs matrix addition operation on two internal matrices (both stored in a CARM mode, and the instruction defaults to A, B matrices for CARM storage), and the calculation result is returned to the computable memory interface according to the row information format.

c)MASC

The instruction is sent to a computable memory through a processor, the computable memory performs matrix addition operation on two internal matrices (both stored in a CACM mode, and the instruction defaults to A, B matrices for CACM storage), and the calculation result is written into a continuous space of a designated memory address according to a column information format.

The instruction is issued first, then followed by the address of matrix a, the rank size, then the address of matrix B and the rank size, and finally the address C where the result of the calculation is to be stored.

Once the instruction is valid MatrixBusy enters the busy state, matrixBusy becomes the idle state after the computational storage is complete.

d)MASR

The instruction is sent to a computable memory through a processor, the computable memory performs matrix addition operation on two internal matrices (both stored in a CARM mode, and the instruction defaults to A, B matrices for CARM storage), and the calculation result is written into a continuous space of a designated memory address according to a row information format.

According to the embodiment of the application, different matrix data calculation and transmission of matrix calculation results are realized by combining different instructions, so that transmission of calculation results in different modes can be realized, transmission of matrix calculation results is realized, and the utilization efficiency of a processor is improved.

In this embodiment, if the matrix calculation result does not need to be transmitted to the processor, the matrix calculation result may be written into the memory in a data writing manner, and specifically, the following detailed description may be made in connection with the following specific implementation manner.

In another specific implementation of the present application, after the step 102, the method may further include:

step C1: and writing the matrix calculation result into the memory according to a data writing mode corresponding to the matrix calculation instruction.

In the embodiment of the present application, the data writing mode (i.e., the data storage mode) refers to a mode for writing matrix data into the memory.

After the matrix calculation results of the first matrix data and the second matrix data are obtained, the matrix calculation results can be written into the memory according to the data writing mode of the matrix calculation instruction, namely, the condition that whether the matrix calculation results are written into the memory is included in the matrix calculation instruction, and when the condition that the memory is written into the matrix calculation instruction is included in the matrix calculation instruction, namely, the data writing mode is included in the matrix calculation instruction, after the matrix calculation results of the first matrix data and the second matrix data are obtained, the matrix calculation results can be written into the memory according to the data writing mode.

In this embodiment, the data writing manner may include a sequential column writing manner and a sequential row writing manner, and these two data writing manners will be described in detail in connection with the following specific implementation manner.

In this embodiment, the matrix of the present invention has two storage modes in the memory, namely, two modes of storage according to continuous address rows and storage according to continuous address columns.

In another specific implementation of the present application, the step C1 may include:

Substep D1: and writing the matrix calculation result into the memory according to the continuous column writing mode when the data writing mode is the continuous column writing mode.

In this embodiment, the sequential column writing mode refers to a mode of writing matrix data into the memory in sequential columns.

After the data writing manner included in the matrix calculation instruction is a continuous column writing manner and the matrix calculation results of the first matrix data and the second matrix data are obtained, the matrix calculation results may be written into the memory according to the continuous column writing manner, for example, as shown in fig. 4, all matrix elements are stored on continuous addresses according to a column traversing manner. The data stored in the memory unit is not different from the data stored in the traditional memory, and the read instruction of the traditional memory is used for reading random data, and the random data is operated according to the instruction of the invention, namely matrix data.

Substep D2: and writing the matrix calculation result into the memory according to the continuous line writing mode when the data writing mode is the continuous line writing mode.

In this embodiment, the continuous line writing mode refers to a mode of writing matrix data into the memory in a continuous line manner.

After the data writing manner included in the matrix calculation instruction is a continuous line writing manner and the matrix calculation results of the first matrix data and the second matrix data are obtained, the matrix calculation results may be written into the memory according to the continuous line writing manner, for example, as shown in fig. 4, all matrix elements are stored on continuous addresses according to a line traversing manner. The data stored in the memory unit is not different from the data stored in the traditional memory, and the read instruction of the traditional memory is used for reading random data, and the random data is operated according to the instruction of the invention, namely matrix data.

In this embodiment, in order to fully utilize the matrix computing capability of the computable memory of the present invention, a compiler layer is required to be modified to convert the computation related to the matrix into the above instruction, and at the same time, a corresponding coding instruction is required for the portion of the matrix computing to ensure the full utilization of the memory computing resource.

According to the data processing method provided by the embodiment of the application, the matrix calculation instruction for the first matrix data and the second matrix data written into the memory is received, the matrix calculation unit arranged in the memory is called in response to the matrix calculation instruction, the first matrix data and the second matrix data are calculated according to the calculation mode of the matrix calculation instruction, the matrix calculation result is obtained, and the matrix calculation result is transmitted according to the data transmission mode corresponding to the matrix calculation instruction. According to the embodiment of the application, the matrix calculation unit is arranged in the memory in advance to perform corresponding matrix data calculation, so that a large amount of matrix operations can be rapidly completed in the memory, the time waste caused by large amount of data carrying between the processor and the memory is reduced, and the utilization rate of the processor and the data processing efficiency are improved.

It should be noted that, in the data processing method provided in the embodiment of the present application, the execution body may be a data processing apparatus, or a control module in the data processing apparatus for executing the data processing method. In the embodiment of the present application, a data processing device is described by taking a data processing method performed by the data processing device as an example.

Referring to fig. 7, a schematic structural diagram of a data processing apparatus according to an embodiment of the present application is shown, and as shown in fig. 7, the data processing apparatus 700 may specifically include the following modules:

a matrix calculation instruction receiving module 710, configured to receive a matrix calculation instruction for the first matrix data and the second matrix data written into the memory;

The matrix calculation result obtaining module 720 is configured to respond to the matrix calculation instruction, call a matrix calculation unit disposed in the memory, and calculate the first matrix data and the second matrix data according to a calculation mode of the matrix calculation instruction, so as to obtain a matrix calculation result;

And a matrix calculation result transmission module 730, configured to transmit the matrix calculation result according to a data transmission mode corresponding to the matrix calculation instruction.

Optionally, the method further comprises:

The running state determining module is used for determining the running state of the matrix computing unit according to the state identifier corresponding to the matrix computing unit;

the calculation result execution module is used for executing the matrix calculation result acquisition module under the condition that the running state is an idle state;

and the matrix calculation instruction caching module is used for caching the matrix calculation instruction under the condition that the running state is a non-idle state.

Optionally, the matrix calculation result transmission module 730 includes:

The first calculation result transmission unit is used for transmitting the matrix calculation result to a command address bus according to the continuous column transmission mode and transmitting the matrix calculation result to a processor through the command address bus when the data transmission mode is the continuous column transmission mode;

And the second calculation result transmission unit is used for transmitting the matrix calculation result to a command address bus according to the continuous line mode and transmitting the matrix calculation result to a processor through the command address bus when the data transmission mode is the continuous line transmission mode.

Optionally, the method further comprises:

and the calculation result writing module is used for writing the matrix calculation result into the memory according to a data writing mode corresponding to the matrix calculation instruction.

Optionally, the calculation result writing module includes:

a first calculation result writing unit, configured to write, in the memory, the matrix calculation result according to the continuous column writing mode when the data writing mode is the continuous column writing mode;

And the second calculation result writing unit is used for writing the matrix calculation result into the memory according to the continuous line writing mode when the data writing mode is the continuous line writing mode.

According to the data processing device provided by the embodiment of the application, the matrix calculation instruction for the first matrix data and the second matrix data written into the memory is received, the matrix calculation unit arranged in the memory is called in response to the matrix calculation instruction, the first matrix data and the second matrix data are calculated according to the calculation mode of the matrix calculation instruction, the matrix calculation result is obtained, and the matrix calculation result is transmitted according to the data transmission mode corresponding to the matrix calculation instruction. According to the embodiment of the application, the matrix calculation unit is arranged in the memory in advance to perform corresponding matrix data calculation, so that a large amount of matrix operations can be rapidly completed in the memory, the time waste caused by large amount of data carrying between the processor and the memory is reduced, and the utilization rate of the processor and the data processing efficiency are improved.

The data processing device in the embodiment of the application can be a device, or can be a component, an integrated circuit, or a chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., and the embodiments of the present application are not limited in particular.

The data processing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The data processing device provided in the embodiment of the present application can implement each process implemented by the method embodiment of fig. 1, and in order to avoid repetition, a description is omitted here.

Optionally, as shown in fig. 8, an electronic device 800 according to an embodiment of the present application further includes a processor 801, a memory 802, and a program or an instruction stored in the memory 802 and capable of running on the processor 801, where the program or the instruction implements the above embodiment of the data processing method when being executed by the processor 801, and the program or the instruction implements each process of the above embodiment of the data processing method when being executed by the processor, and the same technical effects are achieved, so that repetition is avoided and no further description is given here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the data processing method embodiment, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A data processing method, comprising:

Receiving a matrix calculation instruction for first matrix data and second matrix data written into a memory; in response to the matrix calculation instruction, calling a matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to a calculation method of the matrix calculation instruction, and obtaining a matrix calculation result;

When the data transmission mode is a continuous column transmission mode, the matrix calculation result is transmitted to a command address bus according to the continuous column transmission mode, and the matrix calculation result is transmitted to a processor through the command address bus;

In the case where the data transmission mode is a continuous row transmission mode, the matrix calculation result is transmitted to a command address bus according to the continuous row transmission mode, and the matrix calculation result is transmitted to a processor through the command address bus.

2. The method according to claim 1, characterized in that before the calling of the matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction to obtain the matrix calculation result, it also includes:

Determining the operating state of the matrix calculation unit according to the state identifier corresponding to the matrix calculation unit;

When the running state is an idle state, the step of calling the matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction, and obtaining the matrix calculation result is executed; when the running state is a non-idle state, the matrix calculation instruction is cached.

3. The method according to claim 1, characterized in that after the calling of the matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction to obtain the matrix calculation result, it also includes:

The matrix calculation result is written into the memory according to the data writing method corresponding to the matrix calculation instruction.

4. The method according to claim 3, characterized in that the step of writing the matrix calculation result into the memory according to the data writing mode corresponding to the matrix calculation instruction comprises:

When the data writing mode is a continuous column writing mode, writing the matrix calculation result into the memory according to the continuous column writing mode;

When the data writing mode is a continuous row writing mode, the matrix calculation result is written into the memory according to the continuous row writing mode.

5. A data processing device, comprising:

A matrix calculation instruction receiving module, used for receiving a matrix calculation instruction for the first matrix data and the second matrix data written into the memory;

a matrix calculation result acquisition module, for responding to the matrix calculation instruction, calling the matrix calculation unit set in the memory, calculating the first matrix data and the second matrix data according to the calculation method of the matrix calculation instruction, and obtaining the matrix calculation result; a first calculation result transmission unit, for transmitting the matrix calculation result to the command address bus according to the continuous column transmission mode when the data transmission mode is the continuous column transmission mode, and transmitting the matrix calculation result to the processor through the command address bus; a second calculation result transmission unit, for transmitting the matrix calculation result to the command address bus according to the continuous row transmission mode when the data transmission mode is the continuous row transmission mode, and transmitting the matrix calculation result to the processor through the command address bus.

6. The device according to claim 5, further comprising:

An operation status determination module, used to determine the operation status of the matrix calculation unit according to the status identifier corresponding to the matrix calculation unit;

A calculation result execution module, used for executing the matrix calculation result acquisition module when the running state is an idle state;

The matrix calculation instruction cache module is used to cache the matrix calculation instructions when the running state is a non-idle state.

7. The device according to claim 5, further comprising:

A calculation result writing module is used to write the matrix calculation result into the memory according to the data writing method corresponding to the matrix calculation instruction.

8. The device according to claim 7, characterized in that the calculation result writing module comprises:

A first calculation result writing unit, configured to write the matrix calculation result into the memory in a continuous column writing mode when the data writing mode is a continuous column writing mode;

The second calculation result writing unit is used to write the matrix calculation result into the memory in accordance with the continuous row writing mode when the data writing mode is the continuous row writing mode.