CN112602094A - Data processing apparatus, data processing method, and accelerator - Google Patents

Data processing apparatus, data processing method, and accelerator Download PDF

Info

Publication number
CN112602094A
CN112602094A CN202080004332.0A CN202080004332A CN112602094A CN 112602094 A CN112602094 A CN 112602094A CN 202080004332 A CN202080004332 A CN 202080004332A CN 112602094 A CN112602094 A CN 112602094A
Authority
CN
China
Prior art keywords
instruction
control instruction
module
control
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080004332.0A
Other languages
Chinese (zh)
Inventor
韩峰
李鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Publication of CN112602094A publication Critical patent/CN112602094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the application provides a data processing device, a data processing method and an accelerator. The device comprises a control module, a data loading module and a processing module; the data loading module is used for responding to a control instruction of the control module and loading data to be processed for processing by the processing module; the processing module responds to the control instruction of the control module and processes the data to be processed; the control module controls the data loading module and the processing module to execute different control instructions at the same time. In this embodiment, the control module controls the data loading module and the processing module to execute different control instructions at the same time, which is beneficial to improving the utilization rate of processing resources and avoiding the waste of processing resources caused by the waiting process.

Description

Data processing apparatus, data processing method, and accelerator
Technical Field
The present application relates to the field of computer data processing, and in particular, to a data processing apparatus, a data processing method, and an accelerator.
Background
With the advance of the technology, various product implementation algorithms or program products are developed towards refinement, so that the processing processes of various product implementation algorithms or program products are complex and tedious, and large calculation or processing resources need to be consumed, and how to ensure the comprehensive utilization of the processing or calculation resources becomes a technical problem to be solved urgently.
As an example, a Convolutional Neural Network (CNN) is a complex and nonlinear hypothesis model, and the used model parameters are obtained by training and learning, and have the capability of fitting data. The convolutional neural network algorithm can be applied to scenes such as machine vision, natural language processing and the like, and when the CNN algorithm is implemented in an embedded system, because the processing of the neural network consumes resources greatly, the calculation resources and the real-time property need to be fully considered. Therefore, there is a need to improve the computational resource utilization of neural network processing.
Disclosure of Invention
In view of the above, it is an object of the embodiments of the present application to provide a data processing apparatus, a data processing method, and an accelerator.
First, according to a first aspect of an embodiment of the present application, a data processing apparatus is provided, which includes a control module, a data loading module, and a processing module;
the data loading module is used for responding to a control instruction of the control module and loading data to be processed for processing by the processing module;
the processing module responds to the control instruction of the control module and processes the data to be processed;
the control module controls the data loading module and the processing module to execute different control instructions at the same time.
According to a second aspect of the embodiments of the present application, there is provided a data processing method applied to a data processing apparatus, where the data processing apparatus includes a data loading module and a processing module; the method comprises the following steps:
responding to a control instruction, and loading data through the data loading module to be used by the processing module for data processing; and the number of the first and second groups,
responding to the control instruction, and performing data processing through the processing module; the data loading module and the processing module execute different control instructions at the same time.
According to a third aspect of embodiments of the present application, there is provided an accelerator including the apparatus of any one of the first aspect.
The embodiment of the application has the following beneficial effects:
in this embodiment, the control module controls the data loading module and the processing module to execute different control instructions at the same time, and when the processing module processes to-be-processed data corresponding to a current control instruction, the data loading module may load to-be-processed data corresponding to a next control instruction, so that the utilization rate of processing resources is improved, and waste of the processing resources caused by a waiting process is avoided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a schematic structural diagram of a first data processing apparatus according to an exemplary embodiment of the present application.
Fig. 2 is a schematic structural diagram of a second data processing apparatus according to an exemplary embodiment of the present application.
FIG. 3 is a flow chart illustrating the execution of control instructions according to an exemplary embodiment of the present application.
Fig. 4 is a schematic structural diagram of a third data processing apparatus according to an exemplary embodiment of the present application.
Fig. 5 is a schematic diagram illustrating a fourth data processing apparatus according to an exemplary embodiment of the present application.
FIG. 6 is a diagram illustrating execution of a first type of control instruction according to an exemplary embodiment.
FIG. 7 is a diagram illustrating execution of a second type of control instruction according to an exemplary embodiment.
Fig. 8 is a schematic structural diagram of a fifth data processing apparatus according to an exemplary embodiment of the present application.
FIG. 9 is a flow chart illustrating the execution of control instructions according to an exemplary embodiment of the present application.
FIG. 10 is a diagram illustrating execution of a third control instruction according to an exemplary embodiment.
Fig. 11 is a flow chart illustrating a data processing method according to an exemplary embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Based on the problems in the related art, please refer to fig. 1, an embodiment of the present application provides a data processing apparatus, and fig. 1 is a schematic structural diagram of a first data processing apparatus according to an exemplary embodiment of the present application. The device comprises: a control module 11, a data loading module 12 and a processing module 13. In another embodiment, the data loading module is not limited to one. That is, the data loading module may have a plurality. For example, if the data processing apparatus is used for data processing of convolution operation, the data processing apparatus includes two data loading modules, i.e., a feature map loading module and a weight loading module.
The data loading module 12, in response to the control instruction of the control module 11, loads data to be processed for processing by the processing module 13.
The processing module 13, in response to the control instruction of the control module 11, performs processing on data to be processed.
The control module 11 controls the data loading module 12 and the processing module 13 to execute different control instructions at the same time. In another embodiment, the control module 11 controls the data loading module 12 to process the control instruction in advance without waiting for the end of the processing of the last control instruction in the whole data processing apparatus. For example, the control module 11 does not need to wait for the control instruction x0 to finish processing in the processing module 13, and the control module 11 can control the data loading module 12 to process the operation corresponding to the control instruction x1 in advance, wherein the control instruction x0 and the control instruction x1 are control instructions executed in sequence.
In this embodiment, the control module 11 receives a control instruction from an external module, and sends the control instruction to the data loading module 12 and the processing module 13; the data loading module 12 is used for responding to the control instruction and loading the data to be processed for processing by the processing module 13; the processing module 13 responds to the control instruction to process the data to be processed; in order to further improve the comprehensive utilization rate of the processing resources, the control module 11 may control the data loading module 12 and the processing module 13 to execute different control instructions at the same time, that is, the control module 11 may control the data loading module 12 and the processing module 13 to execute different control instructions at the same time, after the data loading module 12 finishes loading the data to be processed corresponding to the current control instruction, it is not necessary to wait for the processing module 13 to finish processing the data to be processed corresponding to the current control instruction, the data loading module 12 may directly load the data corresponding to the next control instruction based on the control of the control module 11, that is, when the processing module 13 processes the data to be processed corresponding to the current control instruction, the data loading module 12 loads the data to be processed corresponding to the next control instruction, therefore, the utilization rate of processing resources is improved, and the waste of the processing resources caused by the waiting process is avoided.
In an embodiment, in response to the data loading module 12 completing executing the ith control instruction, the control module 11 sends the (i + 1) th control instruction to the data loading module 12; responding to the fact that the processing module 13 finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the processing module 13; wherein i is an integer.
It should be noted that the fact that the data loading module 12 finishes executing the ith control instruction means that the data loading module 12 finishes loading the to-be-processed data corresponding to the ith control instruction; the completion of the execution of the ith control instruction by the processing module 13 means that the processing module 13 completes the processing of the data to be processed corresponding to the ith control instruction.
The data loading module 12 is configured to respond to the ith control instruction, and load to-be-processed data corresponding to the ith control instruction. The processing module 13 responds to the ith control instruction, and processes the data to be processed corresponding to the ith control instruction to obtain a processing result. When the processing module 13 processes to-be-processed data corresponding to the ith control instruction in response to the ith control instruction, if the data loading module 12 has already loaded the to-be-processed data corresponding to the ith control instruction, the data loading module 12 may directly receive the (i + 1) th control instruction sent by the control module 11, and load to-be-processed data corresponding to the (i + 1) th control instruction in response to the (i + 1) th control instruction without waiting for the processing module 13 to finish executing the ith control instruction; in this embodiment, the time for the data loading module 12 to wait for the next control instruction is further reduced, so as to avoid the waste of processing resources caused by the waiting time.
In one embodiment, the processing module 13 comprises a systolic array; the processing module 13, in response to the ith control instruction, writes the data to be processed corresponding to the ith control instruction into a systolic array, and performs an operation on the data to be processed through the systolic array to obtain the processing result. The embodiment realizes the data process of the data to be processed through a hardware structure, and is beneficial to improving the processing efficiency.
Fig. 2 is a schematic structural diagram of a second data processing apparatus according to an exemplary embodiment of the present application. The device comprises: a control module 11, a data loading module 12, a processing module 13 and a data writing back module 14.
The data loading module 12, in response to the control instruction of the control module 11, loads data to be processed for processing by the processing module 13.
The processing module 13, in response to the control instruction of the control module 11, performs processing on data to be processed.
The data writing back module 14, in response to the control instruction of the control module 11, writes the processing result of the data to be processed into an external storage module.
The control module 11 controls the data loading module 12, the processing module 13, and the data writing back module 14 to execute different control instructions at the same time.
In this embodiment, the control module 11 receives a control instruction from an external module, and sends the control instruction to the data loading module 12, the processing module 13, and the data writing-back module 14; the data loading module 12 is used for responding to the control instruction and loading the data to be processed for processing by the processing module 13; the processing module 13 responds to the control instruction, and processes the data to be processed to obtain a processing result; the data write-back module 14 writes the processing result into an external storage module in response to the control instruction.
In order to further improve the comprehensive utilization rate of processing resources, the control module 11 may control the data loading module 12, the processing module 13, and the data write-back module 14 to execute different control instructions at the same time, that is, the control module 11 may control the data loading module 12, the processing module 13, and the data write-back module 14 to execute different control instructions at the same time; after the data loading module 12 finishes loading the data to be processed corresponding to the current control instruction, the data loading module 12 can directly load the data to be processed corresponding to the next control instruction based on the control of the control module 11 without waiting for the processing module 13 to finish processing the data to be processed corresponding to the current control instruction; after the processing module 13 finishes processing the data corresponding to the current control instruction, the processing module 13 can directly process the data to be processed corresponding to the next control instruction based on the control of the control module 11 without waiting for the data write-back module 14 to write the processing result corresponding to the current control instruction into the external storage module; that is to say, when the data write-back module 14 writes the processing result corresponding to the previous control instruction into the external storage module, the processing module 13 may process data corresponding to the current control instruction, and the data loading module 12 may load data corresponding to the next control instruction, so as to improve the utilization rate of processing resources and avoid the waste of processing resources caused by the waiting process.
In an embodiment, in response to the data loading module 12 completing executing the ith control instruction, the control module 11 sends the (i + 1) th control instruction to the data loading module 12; responding to the fact that the processing module 13 finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the processing module 13; and, in response to the data write-back module 14 completing the execution of the ith control instruction, sending the (i + 1) th control instruction to the data write-back module 14; wherein i is an integer.
It should be noted that the fact that the data loading module 12 finishes executing the ith control instruction means that the data loading module 12 finishes loading the to-be-processed data corresponding to the ith control instruction; the completion of the execution of the ith control instruction by the processing module 13 means that the processing module 13 completes the processing of the data to be processed corresponding to the ith control instruction; the data write-back module 14 finishes executing the ith control instruction means that the data write-back module 14 finishes writing the processing result corresponding to the ith control instruction into the external storage module.
The data loading module 12 is configured to respond to the ith control instruction, and load to-be-processed data corresponding to the ith control instruction. The processing module 13 responds to the ith control instruction, and processes the data to be processed corresponding to the ith control instruction to obtain a processing result. The data write-back module 14 responds to the ith control instruction, and writes a processing result corresponding to the ith control instruction into an external storage module.
When the processing module 13 responds to the ith control instruction to process the to-be-processed data corresponding to the ith control instruction, if the data loading module 12 has already loaded the to-be-processed data corresponding to the ith control instruction, the data loading module 12 may directly receive the (i + 1) th control instruction sent by the control module 11 and load the to-be-processed data corresponding to the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the processing module 13; the embodiment further reduces the waiting time of the data loading module 12, and avoids the waste of processing resources caused by the waiting time.
Further, when the data write-back module 14 responds to the ith control instruction to write the processing result corresponding to the ith control instruction into the external storage module, if the processing module 13 has already processed the to-be-processed data corresponding to the ith control instruction, without waiting for the completion of the execution of the ith control instruction by the data write-back module 14, the processing module 13 may directly receive the (i + 1) th control instruction sent by the control module 11, and respond to the (i + 1) th control instruction to process the to-be-processed data corresponding to the (i + 1) th control instruction; the embodiment further reduces the waiting time of the data processing module 13, and avoids the waste of processing resources caused by the waiting time.
In an example, please refer to fig. 3, which is a timing diagram illustrating the data loading module 12, the processing module 13, and the data writing-back module 14 processing control instructions; after the data loading module 12 finishes loading the to-be-processed data corresponding to the control instruction 1, the control instruction 2 can be directly obtained and the to-be-processed data corresponding to the control instruction 2 can be loaded without waiting for the completion of the processing of the to-be-processed data corresponding to the control instruction 1 by the processing module 13 and the completion of the writing of the processing result corresponding to the control instruction 1 by the data writing back module 14, so that the time for the data loading module 12 to wait for the next control instruction is further reduced, and the waste of processing resources caused by the waiting time is avoided; correspondingly, after the processing module 13 finishes processing the to-be-processed data corresponding to the control instruction 1 and obtains the processing result, the processing module can directly obtain the control instruction 2 and process the to-be-processed data corresponding to the control instruction 2 without waiting for the data write-back module 14 to write the processing result into the external storage module, so that the time for the processing module 13 to wait for the next control instruction is further reduced, and the waste of processing resources caused by the waiting time is avoided.
It can be understood that, in the embodiment of the present application, no limitation is imposed on the specific type of the external storage module, and the specific setting may be performed according to an actual application scenario. For example, may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
In an exemplary embodiment, the data processing apparatus may be applied to correlation processing based on a convolutional neural network, which is a machine learning algorithm widely used for computer vision tasks such as object recognition, object detection, and semantic segmentation of images. The structure of a neural network typically includes an input layer, one or more hidden layers, and an output layer, wherein operations in the hidden layers include, but are not limited to, convolution operations, pooling operations, or activation operations; in general, the hidden layers in the convolutional neural network may be named according to the types of operations, for example, the hidden layers performing convolution operation may be classified as convolutional layers, the hidden layers performing pooling operation may be classified as pooling layers, or the hidden layers performing activation operation may be classified as activation layers. The data processing device provided by the embodiment of the application can carry out convolution operation of the convolution layer, pooling operation of the pooling layer and activation operation of the activation layer, so that the operation process of the deep neural network is accelerated in a hardware mode, the operation time of the deep neural network is reduced, and the operation efficiency is improved.
The objects processed by the convolutional neural network include, but are not limited to, images, audio, text, or the like, different types of hidden layers in the convolutional neural network correspond to different operation parameters, for example, the operation parameters of the convolutional layers are convolution kernels, the operation parameters of the active layers are activation functions, and the operation parameters of the pooling layers are pooling parameters.
The following description will be given taking as an example the application of the convolutional neural network to the field of image processing, where the data processing apparatus is used to perform convolution operations of convolutional layers in the convolutional neural network: the convolution operation process of the convolution layer is to perform vector inner product operation on the image to be processed based on the convolution kernel to obtain a feature map; in this scenario, the control module 11 sends a convolution operation control instruction to the data loading module 12, the processing module 13, and the data write-back module 14, respectively, and the data loading module 12 loads the image to be processed and the convolution kernel in response to the convolution operation control instruction of the control module 11; the processing module 13 responds to a convolution operation control instruction of the control module 11, and performs convolution operation on the image to be processed through the convolution core to obtain a feature map; the data write-back module 14 responds to the convolution operation control instruction of the control module 11, and writes the feature map into an external storage module; the control module 11 controls the data loading module 12, the processing module 13, and the data writing back module 14 to execute different convolution operation control instructions at the same time, and controls a parallel processing process of the instructions through convolution operation, so that the operation efficiency is effectively improved, and the utilization rate of computing resources is also improved.
Further, in order to improve the data loading efficiency, the data loading module 12 may include an object loading unit and a parameter loading unit, where the object loading unit is configured to load the image to be processed, the parameter loading unit is configured to load the convolution kernel, and the object loading unit and the parameter loading unit are simultaneously loaded, which is favorable for improving the loading efficiency.
In one embodiment, the control module 11 includes at least one instruction slot for caching the control instruction. In one embodiment, the instruction slot includes a set of instruction cache flags and control status signals. Wherein the instruction cache flag is used to indicate whether the instructions cached in the instruction slot are valid. For example, the instruction cache flag may indicate which control instruction is cached by the instruction slot and whether the cached control instruction is valid. The set of control state signals is used to represent the operating state of the corresponding module or represent the operating state of other instruction slots related to the instruction slot. In one embodiment, the working state refers to whether processing or operation of the corresponding module is completed or whether instruction cache operation of the corresponding instruction slot is completed. Through the arrangement of at least one instruction slot, the execution of each instruction can be coordinated, so that each submodule in the data processing device can process the operation corresponding to different instructions at the same time. Therefore, the work efficiency of the data processing device is improved.
The control module 11 sends the (i + 1) th control instruction cached in the instruction slot to the data loading module 12 in response to the data loading module 12 completing the execution of the ith control instruction; responding to the fact that the processing module 13 finishes executing the ith control instruction, and sending the (i + 1) th control instruction cached in the instruction slot to the processing module 13; and in response to the data write-back module 14 completing the execution of the ith control instruction, sending the (i + 1) th control instruction cached in the instruction slot to the data write-back module 14.
Further, the instruction slot may record a state of the control instruction cached in the instruction slot, and the control module 11 may control an execution process of the control instruction according to the state of the control instruction recorded in at least one instruction slot. In an example, taking that the data loading module 12, the processing module 13, and the data writing back module 14 respectively correspond to an instruction slot, and taking an example that one of the instruction slots corresponds to the data loading module 12, after the control module 11 sends the ith control instruction in the instruction slot to the data loading module 12, correspondingly, the state of the control instruction recorded in the instruction slot represents that the ith control instruction is invalid, and at this time, the control module 11 may cache the (i + 1) th control instruction in the instruction slot. Accordingly, after the i +1 th control instruction caches the value of the instruction slot, the state of the control instruction recorded in the instruction slot is changed to indicate that the i +1 th control instruction is valid, which indicates that the i +1 th control instruction has not been sent to the data load module 12. In this embodiment, through the states of the control instructions recorded in the instruction slots, it is ensured that the control module 11 can sequentially cache different control instructions, so as to ensure that different control instructions are sequentially sent to the data loading module 12, the processing module 13, and the data write-back module 14.
In one implementation, the state of the control module 11 recorded in the instruction slot may be represented by a control instruction state signal, where the control instruction state signal indicates whether the control instruction cached in the corresponding instruction slot is valid. In an example, when the control instruction status signal in the instruction slot indicates that the ith control instruction cached in the instruction slot is invalid, the control module 11 may cache the (i + 1) th control instruction in the instruction slot, and set the status value of the control instruction status signal to a value representing valid, which is used to indicate that the (i + 1) th control instruction cached in the instruction slot is valid.
The data loading module 12, the processing module 13, and the data writing back module 14 respectively correspond to an instruction slot, and whether the control instruction cached in the instruction slot is valid is related to whether the module corresponding to the at least one instruction slot completes the operation of the control instruction corresponding to the instruction slot.
As one implementation manner, a control instruction completion signal may be used to indicate whether a module corresponding to the at least one instruction slot completes an operation of a control instruction corresponding to the at least one instruction slot; the "whether the module corresponding to the at least one instruction slot completes the operation of the control instruction corresponding to the at least one instruction slot" means whether the module corresponding to the at least one instruction slot completes the reception of the control instruction corresponding to the instruction slot.
The control module 11 comprises at least one instruction slot for caching the control instruction; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates that the corresponding module completes the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid; otherwise, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is valid. In this embodiment, through the control instruction status signal and the control instruction completion signal, it is ensured that the control module 11 can buffer different control instructions in order, so as to ensure that different control instructions are sent to the data loading module 12, the processing module 13, and the data write-back module 14 in order.
Further, the data loading module 12, the processing module 13, and the data writing back module 14 each correspond to an instruction slot, and the buffering process of the control instruction in the instruction slot is sequentially transmitted in an order corresponding to the data loading module 12, the processing module 13, and the data writing back module 14; in one example, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a fourth instruction slot 113 corresponding to the data writing-back module 14; when the ith control instruction in the second instruction slot 111 is invalid, the (i + 1) th control instruction cached in the second instruction slot 111 is cached in the second instruction slot 111, when the ith control instruction in the third instruction slot 112 is invalid, the (i + 1) th control instruction cached in the second instruction slot 111 is cached in the third instruction slot 112, and when the ith control instruction in the fourth instruction slot 113 is invalid, the (i + 1) th control instruction cached in the third instruction slot 112 is cached in the fourth instruction slot 113, so that the (i + 1) th control instruction is sequentially cached in the second instruction slot 111, the third instruction slot 112 and the fourth instruction slot 113.
Whether the control instruction cached in the instruction slot is valid is related to whether the module corresponding to the at least one instruction slot completes the operation of the control instruction corresponding to the instruction slot or not, and is also related to whether the module corresponding to the at least one instruction slot completes the operation of the corresponding control instruction or not; the "whether at least one instruction slot completes the operation of the corresponding control instruction" refers to whether the control instruction in the instruction slot is cached to another instruction slot.
In an implementation manner, two control instruction completion signals may be set, where one control instruction completion signal is used to indicate whether a module corresponding to the at least one instruction slot completes an operation of the control instruction corresponding to the at least one instruction slot, and the other control instruction completion signal is used to indicate whether the at least one instruction slot completes an operation of the corresponding control instruction.
The control module 11 comprises at least one instruction slot for caching the control instruction; the instruction slot comprises a control instruction state signal and at least two control instruction completion signals; when one of the control instruction completion signals indicates the corresponding module to complete the operation of the corresponding control instruction, and the other control instruction completion signal indicates the at least one instruction slot to complete the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid; otherwise, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is valid. In this embodiment, through the control instruction status signal and the two control instruction completion signals, it is ensured that the control module 11 can buffer different control instructions in order, so as to ensure that different control instructions are sent to the data loading module 12, the processing module 13, and the data write-back module 14 in order.
In one example, referring to fig. 4, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a fourth instruction slot 113 corresponding to the data writing-back module 14. In one embodiment, the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are all configured to: after the control module 11 finishes sending the ith control instruction at different times, respectively caching the corresponding (i + 1) th control instruction. In another embodiment, the instruction slot may cache the corresponding instruction control word of the control instruction after decoding, without caching the corresponding control instruction. The second instruction slot 111 is configured to cache the i +1 th control instruction sent to the data loading module 12 after the control module 11 finishes sending the i-th control instruction; the third instruction slot 112 is configured to cache the (i + 1) th control instruction sent to the processing module 13 after the control module 11 finishes sending the ith control instruction; the fourth instruction slot 113 is configured to cache the i +1 st control instruction sent to the data write-back module 14 after the control module 11 finishes sending the i-th control instruction. It should be noted that, the time points of the i +1 th control instruction in the above instruction slots may be different. The "sending the ith control instruction is completed" means that after the control module 11 sends the ith control instruction to the data loading module 12 corresponding to the second instruction slot 111, the processing module 13 corresponding to the third instruction slot 112, or the data write-back module 14 corresponding to the fourth instruction slot 113, the data loading module 12, the processing module 13, or the data write-back module 14 completes receiving the ith control instruction.
For a second instruction slot 111, in response to invalidation of the ith control instruction in the second instruction slot 111, the control module 11 caches the (i + 1) th control instruction to the second instruction slot 111; and in response to the data loading module 12 completing the execution of the ith control instruction, sending the (i + 1) th control instruction in the second instruction slot 111 to the data loading module 12.
The second instruction slot 111 includes a second valid signal, a data load module completion signal, and a third instruction slot completion signal; the second valid signal is used to indicate whether the control instruction cached in the second instruction slot 111 is valid; the data loading module completion signal is used to indicate whether the data loading module 12 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the data loading module 12); the third instruction slot complete signal is used to indicate whether the third instruction slot 112 completes caching the control instruction (i.e. whether the control module 11 has cached the cached instruction in the second instruction slot 111 to the third instruction slot 112).
When the ith control instruction has been sent to the data loading module 12 and cached to the third instruction slot 112, the state values of the data loading module completion signal and the third instruction slot completion signal are set to the values representing completion, and meanwhile, the state value of the second valid signal is set to the value representing invalidity, at this time, the control module 11 may cache the (i + 1) th control instruction to the second instruction slot 111; when the (i + 1) th control instruction is cached to the second instruction slot 111, the (i + 1) th control instruction is not sent to the data loading module 12 and is cached to the third instruction slot 112, so that the state value of the second valid signal is set to the value representing valid, and the state values of the data loading module completion signal and the third instruction slot completion signal are set to the values representing unfinished.
For a third instruction slot 112, in response to invalidation of the ith control instruction in the third instruction slot 112, the control module 11 caches the (i + 1) th control instruction in the second instruction slot 111 into the third instruction slot 112; and in response to the processing module 13 completing the execution of the ith control instruction, sending the (i + 1) th control instruction in the third instruction slot 112 to the processing module 13.
The third instruction slot 112 includes a third valid signal, a processing module completion signal, and a fourth instruction slot 113 completion signal. The third valid signal is used to indicate whether the control instruction cached in the third instruction slot 112 is valid; the processing module completion signal is used to indicate whether the processing module 13 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the processing module 13); the fourth instruction slot 113 completion signal is used to indicate whether the fourth instruction slot 113 completes caching the control instruction (i.e. whether the control module 11 has cached the cached instruction in the third instruction slot 112 to the fourth instruction slot 113).
When the ith control instruction has been sent to the processing module 13 and cached to the fourth instruction slot 113, the state values of the processing module completion signal and the fourth instruction slot 113 completion signal are set to the values representing completion, and meanwhile, the state value of the third valid signal is set to the value representing invalidity, and the control module 11 may cache the (i + 1) th control instruction to the third instruction slot 112; when the (i + 1) th control instruction is cached to the third instruction slot 112, the (i + 1) th control instruction is not sent to the processing module 13 and is cached to the fourth instruction slot 113, the state value of the third valid signal is set as a value representing valid, and the state values of the processing module completion signal and the fourth instruction slot 113 completion signal are set as values representing incomplete.
For the fourth instruction slot 113, in response to invalidation of the ith control instruction in the fourth instruction slot 113, the control module 11 caches the (i + 1) th control instruction in the third instruction slot 112 to the fourth instruction slot 113; and in response to the data write-back module 14 completing the execution of the ith control instruction, sending the (i + 1) th control instruction in the fourth instruction slot 113 to the processing module 13.
Wherein, the fourth instruction slot 113 includes a fourth valid signal and a data write-back module 14 completion signal; the fourth valid signal is used to indicate whether the control instruction cached in the fourth instruction slot 113 is valid; the data write-back module 14 complete signal is used to indicate whether the data write-back module 14 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the data write-back module 14).
When the ith control instruction is sent to the data write-back module 14, the state value of the completion signal of the data write-back module 14 is set to a value representing completion, and meanwhile, the state value of the fourth valid signal is set to a value representing invalidity, and the control module 11 may cache the (i + 1) th control instruction to the fourth instruction slot 113; when the (i + 1) th control instruction is cached in the fourth instruction slot 113, the (i + 1) th control instruction is not yet sent to the data write-back module 14 by the control module 11, at this time, the state value of the fourth valid signal is set to a value representing valid, and the state value of the data write-back module 14 completion signal is set to a value representing incomplete.
In this embodiment, by setting the second instruction slot 111 corresponding to the data loading module 12, the third instruction slot 112 corresponding to the processing module 13, and the fourth instruction slot 113 corresponding to the data write-back module 14 in the control module 11, the control module 11 can respectively control the data loading module 12, the processing module 13, and the data write-back module 14 through the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, so as to ensure that the data loading module 12, the processing module 13, and the data write-back module 14 can execute different control instructions at the same time, thereby reducing corresponding waiting time, and facilitating improvement of utilization rate of processing resources.
In another example, please refer to fig. 5, which is a schematic structural diagram of a fourth data processing apparatus according to an exemplary embodiment of the present application, in the embodiment shown in fig. 5, based on the embodiment shown in fig. 4, the control module 11 further includes a first instruction slot 114, where the first instruction slot 114 is used for caching the decoded control instruction; wherein the decoded control instruction may be represented by a corresponding instruction control word.
For the first instruction slot 114, the control module 11 decodes the ith control instruction; in response to the decoded ith control instruction in the first instruction slot 114 being invalid, caching the decoded (i + 1) th control instruction into the first instruction slot 114; and in response to invalidation of the decoded ith control instruction in the second instruction slot 111, caching the decoded (i + 1) th control instruction in the first instruction slot 114 into the second instruction slot 111.
Wherein the first instruction slot 114 comprises a first valid signal and a second instruction slot complete signal; the first valid signal is used to indicate whether the control instruction cached in the first instruction slot 114 is valid; the second instruction slot complete signal is used to indicate whether the second instruction slot 111 completes caching the control instruction (i.e. whether the control module 11 has cached the cached instruction in the first instruction slot 114 to the second instruction slot 111).
When the decoded ith control instruction has been sent to the second instruction slot 111, the state value of the second instruction slot completion signal is set to a value representing completion, and meanwhile, the state value of the first valid signal is set to a value representing invalidity, and the control module 11 may cache the decoded (i + 1) th control instruction to the first instruction slot 114; when the decoded (i + 1) th control instruction is cached in the first instruction slot 114, the (i + 1) th control instruction is not cached in the second instruction slot 111 by the control module 11, the state value of the first valid signal is set to be a value representing valid, and the state value of the second instruction slot completion signal is set to be a value representing incomplete.
Accordingly, the control instructions buffered in the second instruction slot 111, the third instruction slot 112 and the fourth instruction slot 113 are decoded control instructions. For the execution process of the decoded control instruction cached in the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, reference may be made to the description in the embodiment shown in fig. 4, and details are not repeated here.
In this embodiment, by setting the first instruction slot 114, the second instruction slot 111 corresponding to the data loading module 12, the third instruction slot 112 corresponding to the processing module 13, and the fourth instruction slot 113 corresponding to the data writing-back module 14 in the control module 11, the control module 11 can respectively control the data loading module 12, the processing module 13, and the data writing-back module 14 through the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, so as to ensure that the data loading module 12, the processing module 13, and the data writing-back module 14 can execute different control instructions at the same time, thereby reducing corresponding waiting time and facilitating improvement of utilization rate of processing resources.
In an exemplary embodiment, the data processing apparatus provided in the embodiment of the present application may be applied to a convolutional neural network to process a target object (the target object includes, but is not limited to, an image, audio, video, or text, etc.), and may perform a convolutional operation of a convolutional layer, a pooling operation of a pooling layer, or an activation operation of an activation layer, so as to accelerate an operation process of a deep neural network in a hardware manner, reduce an operation time of the deep neural network, and improve operation efficiency.
The following description will be given taking as an example the application of the convolutional neural network to the field of image processing, where the data processing apparatus is used to perform convolution operations of convolutional layers in the convolutional neural network: the control module 11 receives a convolution operation control instruction and distributes the convolution operation control instruction to the data loading module 12, the processing module 13 and the data writing back module 14. Referring to fig. 6, the control module 11 divides the distribution process of the control instruction into 4 stages, which are a decoding stage, a loading stage, an execution stage, and a storage stage. The embodiment realizes the control of the execution process of different control instructions through the above 4 stages.
The decoding stage corresponds to the first instruction slot 114, and the control module 11 caches the decoded control instruction in the first instruction slot 114.
The loading stage corresponds to the second instruction slot 111, and the control module 11 caches the decoded control instruction in the first instruction slot 114 to the second instruction slot 111, and then sends the decoded control instruction cached in the second instruction slot 111 to the data loading module 12.
The execution stage corresponds to the third instruction slot 112, and the control module 11 caches the decoded control instruction in the second instruction slot 111 to the third instruction slot 112, and then sends the decoded control instruction cached in the third instruction slot 112 to the processing module 13.
The storage stage corresponds to the fourth instruction slot 113, and the control module 11 caches the decoded control instruction in the third instruction slot 112 to the fourth instruction slot 114, and then sends the decoded control instruction cached in the fourth instruction slot 114 to the data write-back module 14.
The control method comprises the steps that different control instructions are cached in each instruction slot at different moments, and the execution of the control instructions is controlled through control instruction state signals and control instruction completion signals in the instruction slots. For convenience of understanding, please refer to fig. 7, which illustrates the buffering of the control instructions in each instruction slot at different time periods:
at time period T0: the control module 11 decodes the control instruction a and caches the decoded control instruction a to the first instruction slot 114 corresponding to the decoding stage; at this time, in the first instruction slot 114, the first valid signal indicates that the decoded control instruction a is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 does not complete the buffering of the decoded control instruction a.
At time period T1: the control module 11 caches the decoded control instruction a in the first instruction slot 114 to the second instruction slot 111 corresponding to the loading stage, and then sends the decoded control instruction a in the second instruction slot 111 to the data loading module 12; at this time, in the second instruction slot 111, the second valid signal indicates that the decoded control instruction a is valid, the data loading module completion signal indicates that the data loading module 12 completes receiving the decoded control instruction a, and the third instruction slot completion signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction a.
Meanwhile, since the decoded control instruction a in the first instruction slot 114 is cached to the second instruction slot 111, the first valid signal in the first instruction slot 114 indicates that the decoded control instruction a is invalid, the second instruction slot completion signal indicates that the second instruction slot 111 completes caching the decoded control instruction a, the control module 11 may decode the control instruction b and cache the decoded control instruction b in the first instruction slot 114, accordingly, in the first instruction slot 114, the first valid signal indicates that the decoded control instruction b is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 does not complete caching the decoded control instruction b.
At time period T2: the control module 11 caches the decoded control instruction a in the second instruction slot 111 to a third instruction slot 112 corresponding to the execution stage, and then sends the decoded control instruction a in the third instruction slot 112 to the processing module 13; at this time, in the third instruction slot, the third valid signal indicates that the decoded control instruction a is valid, the processing module completion signal indicates that the processing module 13 completes receiving the decoded control instruction a, and the fourth instruction slot completion signal indicates that the fourth instruction slot 113 does not complete buffering the decoded control instruction a.
Meanwhile, since the decoded control instruction a in the second instruction slot 111 is cached to the third instruction slot 112, the second valid signal in the second instruction slot indicates that the decoded control instruction a is invalid, the data loading module completion signal indicates that the data loading module 12 completes receiving the decoded control instruction a, and the third instruction slot completion signal indicates that the third instruction slot 112 completes caching the decoded control instruction a, the control module 11 may cache the decoded control instruction b in the first instruction slot 114 into the second instruction slot 111, and in a possible case, the loading module 12 is still executing the decoded control instruction a, the decoded control instruction b cannot be issued, and the second valid signal indicates that the decoded control instruction b is valid in the second instruction slot, the data load module complete signal indicates that the data load module 12 does not complete receiving the decoded control instruction b, and the third instruction slot complete signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction b.
Meanwhile, since the decoded control instruction b in the first instruction slot 114 is cached to the second instruction slot 111, the first valid signal in the first instruction slot indicates that the decoded control instruction b is invalid, the second instruction slot completion signal indicates that the second instruction slot 111 completes caching the decoded control instruction b, and then the control module 11 may decode the control instruction c and cache the decoded control instruction c into the first instruction slot 114, and accordingly, in the first instruction slot, the first valid signal indicates that the decoded control instruction c is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 does not complete caching the decoded control instruction c.
At time T3: the control module 11 caches the decoded control instruction a in the third instruction slot 112 to the fourth instruction slot 113 corresponding to the storage stage, and then sends the decoded control instruction a in the fourth instruction slot 113 to the data write-back module 14.
Meanwhile, because the data loading module 12 is still executing the decoded control instruction a, the decoded control instruction b cannot be issued to the data loading module 12 and is still cached in the second instruction slot; at this time, in the second instruction slot, the second valid signal indicates that the decoded control instruction b is valid, the data loading module completion signal indicates that the data loading module 12 does not complete receiving the decoded control instruction b, and the third instruction slot completion signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction b.
At time T4: after the data loading module 12 finishes executing the decoded control instruction a, the control module 11 sends the decoded control instruction b in the second instruction slot to the loading module 12; at this time, in the second instruction slot, the second valid signal indicates that the decoded control instruction b is valid, the data loading module completion signal indicates that the data loading module 12 completes receiving the decoded control instruction b, and the third instruction slot completion signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction b.
At time T5: the control module 11 caches the decoded control instruction b in the second instruction slot 111 to a third instruction slot 112, and then sends the decoded control instruction b cached in the third instruction slot 112 to the processing module 13 after the processing module 13 finishes executing the decoded control instruction b; at this time, in the third instruction slot, the third valid signal indicates that the decoded control instruction b is valid, the processing module completion signal indicates that the processing module 13 completes receiving the decoded control instruction b, and the fourth instruction slot completion signal indicates that the fourth instruction slot 113 does not complete buffering the decoded control instruction b.
Meanwhile, since the decoded control instruction b in the second instruction slot 111 is cached to the third instruction slot 112, the second valid signal in the second instruction slot 111 indicates that the decoded control instruction b is invalid, the data loading module completion signal indicates that the data loading module 12 completes receiving the decoded control instruction b, and the third instruction slot completion signal indicates that the third instruction slot 112 completes caching the decoded control instruction b, the control module 11 may cache the decoded control instruction c in the first instruction slot 114 into the second instruction slot 111, and in a possible case, the loading module 12 is still executing the decoded control instruction b, and the decoded control instruction c cannot be issued to the data loading module 12, then the second valid signal in the second instruction slot 111 indicates that the decoded control instruction c is valid, the data load module complete signal indicates that the data load module 12 does not complete receiving the decoded control instruction c, and the third instruction slot complete signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction c.
Meanwhile, since the decoded control instruction c in the first instruction slot 114 is cached to the second instruction slot 111, the first valid signal in the first instruction slot indicates that the decoded control instruction c is invalid, the second instruction slot completion signal indicates that the second instruction slot 111 completes caching the decoded control instruction c, and then the control module 11 may decode the control instruction d and cache the decoded control instruction d into the first instruction slot 114, accordingly, the first valid signal in the first instruction slot 114 indicates that the decoded control instruction d is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 does not complete caching the decoded control instruction d.
At time period T6: the control module 11 caches the decoded control instruction b in the third instruction slot 112 to a fourth instruction slot 113 corresponding to the storage stage, and then sends the decoded control instruction b cached in the fourth instruction slot 113 to the data write-back module 14 after the data write-back module 14 finishes executing the decoded control instruction a.
Meanwhile, because the loading module 12 is still executing the decoded control instruction b, the decoded control instruction c cannot be issued and still remains cached in the second instruction slot, the second valid signal in the second instruction slot 111 indicates that the decoded control instruction c is valid, the data loading module completion signal indicates that the data loading module 12 does not complete receiving the decoded control instruction c, and the third instruction slot completion signal indicates that the third instruction slot 112 does not complete caching the decoded control instruction c.
In an exemplary embodiment, the data to be processed includes an object to be processed and an operation parameter, wherein the object to be processed includes, but is not limited to, an image, audio or text; the operational parameters include, but are not limited to, convolution kernels, pooling parameters, or activation functions.
Wherein the processing module 13 comprises a systolic array; the processing module 13 responds to the ith control instruction, writes the object to be processed and the operation parameter corresponding to the ith control instruction into the systolic array, and performs an operation on the object to be processed and the operation parameter through the systolic array to obtain the processing result.
In one example, the following description will be given taking as an example the case where the data processing apparatus is used to perform a convolution operation of an image: the object to be processed is an image, the operation parameter is a convolution kernel, the processing module 13 writes the image and the convolution kernel into the systolic array, and performs product accumulation operation on the image and the convolution kernel through the systolic array to obtain a convolution image.
In an embodiment, in consideration that in a convolutional neural network application scenario, to-be-processed data to be loaded at least includes two parts, namely, an object to be processed and an operating parameter, in order to further improve data loading efficiency, please refer to fig. 8, which is a schematic diagram of a fifth data processing apparatus according to an exemplary embodiment of the present application. The data loading module 12 includes an object loading unit 121 and a parameter loading unit 122. The control module 11, in response to the object loading unit 121 completing executing the ith control instruction, sends the (i + 1) th control instruction to the object loading unit 121; and in response to the parameter loading unit 122 completing the execution of the ith control instruction, sending the (i + 1) th control instruction to the parameter loading unit 122. In this embodiment, the object loading unit 121 and the parameter loading unit 122 are used to load the object to be processed and the operating parameter respectively, and load the object and the operating parameter simultaneously, which is beneficial to improving the loading efficiency.
It should be noted that the fact that the object loading unit 121 finishes executing the ith control instruction means that the object loading unit 121 finishes loading the object to be processed corresponding to the i control instructions; the fact that the parameter loading unit 122 finishes executing the ith control instruction means that the parameter loading unit 122 finishes loading the operation parameters corresponding to the ith control instruction.
The object loading unit 121, in response to the ith control instruction, loads an object to be processed corresponding to the ith control instruction; the parameter loading unit 122 loads an operating parameter corresponding to the ith control instruction in response to the ith control instruction.
When the processing module 13 responds to the ith control instruction to process the to-be-processed data corresponding to the ith control instruction, if the object loading unit 121 has already loaded the to-be-processed object corresponding to the ith control instruction, the object loading unit 121 may directly receive the (i + 1) th control instruction sent by the control module 11 and load the to-be-processed object corresponding to the (i + 1) th control instruction without waiting for the processing module 13 to finish executing the ith control instruction; similarly, if the parameter loading unit 122 has already loaded the operation parameter corresponding to the ith control instruction, the parameter loading unit 122 may directly receive the (i + 1) th control instruction sent by the control module 11 and load the operation parameter corresponding to the (i + 1) th control instruction without waiting for the processing module 13 to finish executing the ith control instruction; in this embodiment, the waiting time of the object loading unit 121 and the parameter loading unit 122 for the (i + 1) th control instruction is further reduced, and the waste of processing resources caused by too long waiting time is avoided.
In one embodiment, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a fourth instruction slot 113 corresponding to the data writing-back module 14; when the data loading module 12 includes the object loading unit 121 and the parameter loading unit 122, correspondingly, for the second instruction slot 111, the control module 11, in response to the invalidation of the ith control instruction in the second instruction slot 111, caches the (i + 1) th control instruction to the second instruction slot 111; and in response to the object loading unit 121 and the parameter loading unit 122 completing execution of the ith control instruction, sending the (i + 1) th control instruction in the second instruction slot 111 to the data loading module 12.
The second instruction slot 111 includes a second valid signal, an object load unit completion signal, a parameter load unit completion signal, and a third instruction slot completion signal; the second valid signal is used to indicate whether the control instruction cached in the second instruction slot 111 is valid; the object load unit completion signal is used to indicate whether the object load unit 121 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the object load unit 121); the parameter loading unit completion signal is used to indicate whether the parameter loading unit 122 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the parameter loading unit 122); the third instruction slot complete signal is used to indicate whether the third instruction slot 112 completes caching the control instruction (i.e. whether the control module 11 has cached the cached instruction in the second instruction slot 111 to the third instruction slot 112).
When the ith control instruction has been sent to the object loading unit 121, the parameter loading unit 122, and has been cached to the third instruction slot 112, the state values of the object loading unit completion signal, the parameter loading unit completion signal, and the third instruction slot completion signal are set to the values representing completion, and meanwhile, the state value of the second valid signal is set to the value representing invalidity, at this time, the control module 11 may cache the (i + 1) th control instruction to the second instruction slot 111; when the (i + 1) th control instruction is cached to the second instruction slot 111, the (i + 1) th control instruction is not sent to the object loading unit 121, the parameter loading unit 122, and the third instruction slot 112, so that the state value of the second valid signal is set as a value representing valid, and the state values of the data loading module completion signal and the third instruction slot completion signal are set as values representing unfinished.
For the execution process of the third instruction slot 112 and the fourth instruction slot 113, reference may be made to the embodiment shown in fig. 3, which is not described herein in detail in this embodiment of the application.
In an embodiment, in order to further improve the efficiency of data processing, target data to be processed may be divided into at least two parts, where the target data to be processed is one part of the target data to be processed, the control module 11 performs control through at least two control instructions, and one control instruction indicates one part of the target data to be processed, so as to implement processing on the target data to be processed, and since the target data to be processed is divided into at least two parts, when the data to be processed is loaded based on one control instruction, the data loading module 12 loads only one part of the target data to be processed, which is beneficial to improving loading efficiency, so that the processing module 13 does not need to wait for the data loading module 12 to completely load the target data to be processed, and can process the loaded target data to be processed more quickly, further improving the processing efficiency.
The target data to be processed is divided into at least two parts, the control is controlled through at least two control instructions, one control instruction indicates one part of the target data to be processed, and the target data to be processed is processed.
The result non-write-back control instruction is used to instruct the processing module 13 to not send the processing result to the data write-back module 14 after obtaining the processing result, but to cache the processing result, where the processing result corresponding to the result non-write-back control instruction is not the final processing result finally written into the external storage module by the data write-back module 14, but is a part of the final processing result; after receiving the result non-write-back control instruction, the processing module 13 processes corresponding data to be processed according to the result non-write-back control instruction to obtain a processing result and cache the processing result, and then generates an end signal sent to the control signal after the cache is completed, the control module 11 receives the end signal sent after the processing module 13 caches the processing result, and the end signal represents that the execution of the result non-write-back control instruction in the data processing device is completed.
The result write-back control instruction is used to instruct the processing module 13 to send all processing results related to the target data to be processed to the data write-back module 14, after receiving the result write-back control instruction, the processing module 13 writes back, according to the result write-back control instruction, processing the corresponding data to be processed to obtain a processing result, integrating all processing results related to the target data to be processed, sending the integrated processing results to the data write-back module 14, writing the integrated processing results into an external storage module by the data write-back module 14, after the write operation is completed, the data write-back module 14 generates an end signal and sends it to the control module 11, the control module 11 receives an end signal sent by the data write-back module 14 after the processing result is written into the data write-back module, the end signal represents that the execution of the result write-back control instruction in the data processing device is finished.
In this embodiment, since the target data to be processed is divided into at least two parts, and each part is indicated by a control instruction, when the data to be processed corresponding to the control instruction is loaded by the data loading module 12, only a part of the target data to be processed needs to be loaded, which is beneficial to improving the loading efficiency, and reduces the time for the processing module 13 to wait for the data loading module 12 to load the target data to be processed, so that the processing can more quickly process the loaded target data to be processed, which is beneficial to improving the processing efficiency.
Further, the control module 11 may return an end signal corresponding to the control instruction to the external control module to notify the external control module that the control instruction has been executed, so that the external control module performs a next processing step based on a final processing result written in the external storage module.
The two control instructions have different processing modes for the obtained processing result and different sending ending signals, the result write-back control instruction sends an ending signal after the data write-back module 14 completes the instruction, and the result non-write-back control instruction sends an ending signal after the processing module 13 completes the instruction; in an exemplary scenario, the data write-back module 14 is writing the final processing result a corresponding to the target data a to be processed into an external storage module, at this time, the data write-back module 14 is not executed yet, and therefore, an end signal a sent to the control module 11 is not generated, at this time, the control module 11 may have already processed a part of the target data B to be processed, that is, a part of the target data B to be processed1And generates an end signal B sent to the control module 111If the control module 11 sends the end signal B to the host computer at this time1Returning to the external control module, which is directly based on the end signal B1The next step is performed ignoring the not yet received end signal a, and the external control module may directly skip the processing step based on the final processing result a, possibly causing a process flow error.
Therefore, in order to ensure the accuracy of the processing flow, the control module 11 returns the end signal corresponding to the control instruction to the external control module according to the receiving sequence and the first-in first-out principle of the control instruction, and if it is determined that the currently received end signal is not currently to be sent according to the receiving sequence and the first-in first-out principle of the control instruction, the currently received end signal is buffered first until the end signal is sent in turn, and the end signal is returned to the external control module. In this embodiment, the end signal is subjected to order preserving processing, so that the control signal received from the external control module first is ensured, and the corresponding end signal is returned to the external control module first, thereby ensuring the accuracy and the ordered execution of the data processing flow.
In an example, please refer to fig. 9, for example, the target data C to be processed is divided into two parts, including data C1 to be processed and data C2 to be processed, the control instruction includes a result write-back control instruction and a result non-write-back control instruction, the result non-write-back control instruction corresponds to the data C1 to be processed, and the result write-back control instruction corresponds to the data C2 to be processed.
In the embodiment shown in fig. 9, the data loading module 12 loads the to-be-processed data c1 based on the result non-write-back control instruction, and the processing module 13 processes the to-be-processed data c1 based on the result non-write-back control instruction, so as to obtain a processing result c1, and caches the processing result c 1; the data loading module 12 loads the data to be processed C2 based on the data write-back control instruction, the processing module 13 processes the data to be processed C2 based on the result write-back control instruction to obtain a processing result C2, then integrates the processing result C1 with the processing result C2 to obtain a final processing result (C1, C2) corresponding to the target data C to be processed, and then the data write-back module 14 writes the final processing result (C1, C2) into an external storage module.
In an exemplary embodiment, the data to be processed includes an object to be processed and an operation parameter, wherein the object to be processed includes, but is not limited to, an image, audio or text; the operational parameters include, but are not limited to, convolution kernels, pooling parameters, or activation functions. In order to further improve the efficiency of data processing, the target object to be processed may be divided into at least two parts and the target operation parameter may be divided into at least two parts, the target object to be processed is one part of the target object to be processed, and the operation parameter is one part of the target operation parameter.
Correspondingly, the control instruction includes a result write-back control instruction and a result non-write-back control instruction, the result non-write-back control instruction is used for instructing the processing module 13 not to send the processing result to the data write-back module 14 after obtaining the processing result, but to cache the processing result, and the processing result corresponding to the result non-write-back control instruction is not the final processing result finally written into the external storage module by the data write-back module 14, but is a part of the final processing result; the result write-back control instruction is configured to instruct the processing module 13 to send all processing results related to the target object to be processed and the target operating parameter to the data write-back module 14, the processing module 13 integrates all processing results related to the target object to be processed and the target operating parameter according to the result write-back control instruction and then sends the integrated processing results to the data write-back module 14, and the data write-back module 14 writes the integrated processing results into an external storage module.
In this embodiment, the target object to be processed and the target operation parameter are divided into at least two parts, and each part is indicated by a control instruction, so that the data loading module 12 only needs to load one part of the target object to be processed and the target operation parameter when loading the target object to be processed and the operation parameter corresponding to the control instruction, which is beneficial to improving the loading efficiency, and reduces the time for the processing module 13 to wait for the data loading module 12 to load the target data to be processed, so that the processing can more quickly process the loaded target data to be processed, and is beneficial to improving the processing efficiency.
It should be noted that, in the embodiments shown in fig. 4 and fig. 5, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a fourth instruction slot 113 corresponding to the data writing-back module 14; the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are used for caching the control instruction, wherein when the control instruction is a result non-write-back control instruction, it is not necessary to cache the data non-write-back control instruction in the fourth instruction slot 113, nor to send the data non-write-back instruction to the data write-back module 14, that is, the data write-back module 14 does not need to execute the result non-write-back control instruction.
In an exemplary embodiment, the data processing apparatus provided in the embodiment of the present application may be applied to a convolutional neural network to process a target object (the target object includes, but is not limited to, an image, audio, video, or text, etc.), and may perform a convolutional operation of a convolutional layer, a pooling operation of a pooling layer, or an activation operation of an activation layer, so as to accelerate an operation process of a deep neural network in a hardware manner, reduce an operation time of the deep neural network, and improve operation efficiency.
The following description will be given taking as an example the application of the convolutional neural network to the field of image processing, where the data processing apparatus is used to perform convolution operations of convolutional layers in the convolutional neural network: the control module 11 receives a convolution operation control instruction and distributes the convolution operation control instruction to the data loading module 12, the processing module 13 and the data writing back module 14; referring to fig. 10, the control module 11 divides the distribution process of the control instruction into 4 stages, which are a decoding stage, a loading stage, an execution stage, and a storage stage. The decode stage corresponds to the first instruction slot 114, the load stage corresponds to the second instruction slot 111, the execute stage corresponds to the third instruction slot 112, and the store stage corresponds to the fourth instruction slot 113; when the control instruction is a result non-write-back control instruction, the data non-write-back control instruction does not need to be cached in the fourth instruction slot 113, and the data non-write-back instruction does not need to be sent to the data write-back module 14, that is, the data write-back module 14 does not need to execute the result non-write-back control instruction; when the control instruction is a result write-back control instruction, the result write-back control instruction needs to be cached in the fourth instruction slot 113 and sent to the data write-back module 14, and executed by the data write-back module 14.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement the method without creative effort.
Correspondingly, please refer to fig. 11, an embodiment of the present application further provides a data processing method, which is applied to a data processing apparatus, where the data processing apparatus includes a data loading module and a processing module; the method comprises the following steps:
in step S101, in response to a control instruction, data is loaded by the data loading module for data processing by the processing module.
In step S102, in response to the control instruction, performing data processing by the processing module; the data loading module and the processing module execute different control instructions at the same time.
In an embodiment, the method further comprises:
responding to the control instruction, and writing a processing result of the data to be processed into an external storage module; the data loading module, the processing module and the data writing back module execute different control instructions at the same time.
In an embodiment, the method further comprises:
responding to the fact that the data loading module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the data loading module; responding to the fact that the processing module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the processing module; wherein i is an integer.
The step S101 includes: and responding to the ith control instruction, and loading the data to be processed corresponding to the ith control instruction.
The step S102 includes: and responding to the ith control instruction, and processing the data to be processed corresponding to the ith control instruction to obtain a processing result.
In an embodiment, the apparatus further comprises a data write back module.
The method further comprises the following steps:
and responding to the fact that the data write-back module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the data write-back module.
And responding to the ith control instruction, and writing a processing result corresponding to the ith control instruction into an external storage module through the data write-back module.
In one embodiment, the method further comprises:
when the processing module responds to the ith control instruction to process the data to be processed corresponding to the ith control instruction, the data loading module receives the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the processing module.
In one embodiment, the method further comprises:
when the data write-back module responds to the ith control instruction and writes the processing result corresponding to the ith control instruction into an external storage module, the processing module receives the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the data write-back module.
In one embodiment, the method further comprises:
and caching the control instruction through at least one instruction slot, and controlling the execution process of the control instruction according to the state of the control instruction recorded by the at least one instruction slot.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; the at least one instruction slot comprises at least one control instruction state signal, wherein the control instruction state signal is used for indicating whether the control instruction cached in the corresponding instruction slot is valid or not.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; the at least one instruction slot comprises at least one control instruction completion signal; the at least one control instruction completion signal is used to indicate whether a module corresponding to the at least one instruction slot completes an operation of the control instruction corresponding to the at least one instruction slot, or the at least one control instruction completion signal is used to indicate whether the at least one instruction slot completes an operation of the corresponding control instruction.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates the corresponding module to complete the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates that the at least one instruction slot completes the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; and the (i + 1) th control instruction sent to the data loading module, the processing module and the data writing-back module is acquired from the instruction slot.
In one embodiment, the method further comprises:
and caching the control instruction corresponding to the data loading module through a second instruction slot, caching the control instruction corresponding to the processing module through a third instruction slot, and caching the control instruction corresponding to the data writing-back module through the third instruction slot.
In one embodiment, the method further comprises:
in response to the ith control instruction in the second instruction slot being invalid, caching the (i + 1) th control instruction into the second instruction slot.
The step of sending the (i + 1) th control instruction to the data loading module in response to the data loading module finishing executing the ith control instruction comprises:
and sending the (i + 1) th control instruction in the second instruction slot to the data loading module in response to the completion of the execution of the ith control instruction by the data loading module.
In one embodiment, the method further comprises:
in response to the ith control instruction in the third instruction slot being invalid, caching the (i + 1) th control instruction in the second instruction slot into the third instruction slot.
The step of sending the (i + 1) th control instruction to the processing module in response to the processing module finishing executing the ith control instruction comprises:
and sending the (i + 1) th control instruction in the third instruction slot to the processing module in response to the fact that the processing module finishes executing the ith control instruction.
In one embodiment, the method further comprises:
in response to the invalidation of the ith control instruction in the fourth instruction slot, caching the (i + 1) th control instruction in the third instruction slot into the fourth instruction slot.
The response to the data write-back module finishing executing the ith control instruction, sending the (i + 1) th control instruction to the data write-back module, including:
and in response to the completion of the execution of the ith control instruction by the data write-back module, sending the (i + 1) th control instruction in the fourth instruction slot to the processing module.
In one embodiment, the method further comprises:
decoding the ith control instruction; in response to invalidation of the decoded ith control instruction in the first instruction slot, caching the decoded (i + 1) th control instruction into the first instruction slot; and the number of the first and second groups,
in response to invalidation of the decoded ith control instruction in the second instruction slot, caching the decoded (i + 1) th control instruction in the first instruction slot into the second instruction slot.
In one embodiment, the control instructions include a result write-back control instruction and a result not write-back control instruction.
The data to be processed is one part of target data to be processed; the target data to be processed is divided into at least two parts.
The result non-write-back control instruction is used for indicating the processing module to cache the processing result;
the result write-back control instruction is used for instructing the processing module to send all processing results related to the target data to be processed to the data write-back module.
In one embodiment, the step S102 includes:
processing corresponding data to be processed according to the result non-write-back control instruction to obtain a processing result and cache the processing result; and the number of the first and second groups,
and processing the corresponding data to be processed according to the result write-back control instruction to obtain a processing result, and integrating all processing results related to the target data to be processed and then sending the integrated processing result to the data write-back module.
In one embodiment, the method further comprises:
if the control instruction is the result non-write-back control instruction, receiving an end signal sent by the processing module after the processing result is cached; and the number of the first and second groups,
if the control instruction is the result write-back control instruction, receiving an end signal sent by the data write-back module after the processing result is written in; wherein the end signal indicates that the result is not written back to the control instruction or that the execution of the result is completed in the data processing apparatus.
In one embodiment, the method further comprises:
and returning an ending signal corresponding to the control instruction to an external control module according to the receiving sequence of the control instruction and a first-in first-out principle.
In one embodiment, the method further comprises:
and if the currently received ending signal is determined not to be currently sent according to the receiving sequence of the control instruction and a first-in first-out principle, caching the currently received ending signal.
In an embodiment, the data to be processed includes an object to be processed and an operation parameter.
In one embodiment, the object to be processed includes any one of: images, audio, or text; the operating parameter includes any one of: convolution kernels, pooling parameters, or activation functions.
In one embodiment, the data loading module includes an object loading unit and a parameter loading unit.
The step S101 includes:
responding to the ith control instruction, and loading the object to be processed corresponding to the ith control instruction through the object loading unit; and the number of the first and second groups,
and responding to the ith control instruction, and loading the operating parameters corresponding to the ith control instruction through the parameter loading unit.
In one embodiment, the step S102 includes:
and responding to the ith control instruction, writing the data to be processed corresponding to the ith control instruction into a pulse array, and performing operation on the data to be processed through the pulse array to obtain a processing result.
In one embodiment, the step S102 includes:
and responding to the ith control instruction, respectively writing the object to be processed and the operation parameters corresponding to the ith control instruction into the pulse array, and performing operation on the object to be processed and the operation parameters through the pulse array to obtain a processing result.
For a specific implementation manner of the method embodiment, reference may be made to the description of the apparatus embodiment, which is not described herein again.
Correspondingly, the embodiment of the application also provides an accelerator, which comprises the device in any one of the above items.
The accelerator may be applied to various neural networks, such as convolutional neural networks.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The method and apparatus provided by the embodiments of the present invention are described in detail above, and the principle and the embodiments of the present invention are explained in detail herein by using specific examples, and the description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (57)

1.一种数据处理装置,其特征在于,包括控制模块、数据加载模块和处理模块;1. A data processing device, comprising a control module, a data loading module and a processing module; 所述数据加载模块,响应于所述控制模块的控制指令,加载待处理数据以供所述处理模块进行处理;the data loading module, in response to the control instruction of the control module, loads the data to be processed for the processing module to process; 所述处理模块,响应于所述控制模块的控制指令,进行待处理数据的处理;The processing module, in response to the control instruction of the control module, processes the data to be processed; 所述控制模块,控制所述数据加载模块和所述处理模块在同一时刻执行不同的控制指令。The control module controls the data loading module and the processing module to execute different control instructions at the same time. 2.根据权利要求1所述的装置,其特征在于,还包括数据写回模块,2. The device according to claim 1, further comprising a data write-back module, 所述数据写回模块,响应于所述控制模块的控制指令,将所述待处理数据的处理结果写入外部存储模块;The data write-back module, in response to the control instruction of the control module, writes the processing result of the data to be processed into the external storage module; 所述控制模块,控制所述数据加载模块、所述处理模块和所述数据写回模块在同一时刻执行不同的控制指令。The control module controls the data loading module, the processing module and the data write-back module to execute different control instructions at the same time. 3.根据权利要求1所述的装置,其特征在于,3. The device according to claim 1, characterized in that, 所述控制模块具体用于:响应于所述数据加载模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块;以及响应于所述处理模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块;其中,i为整数;The control module is specifically configured to: in response to the completion of the execution of the i-th control instruction by the data loading module, send the i+1-th control instruction to the data loading module; and in response to the processing module for the i-th control instruction; After the execution of the control instructions is completed, the i+1th control instruction is sent to the processing module; wherein, i is an integer; 所述数据加载模块具体用于:响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理数据;The data loading module is specifically configured to: in response to the i-th control instruction, load the data to be processed corresponding to the i-th control instruction; 所述处理模块具体用于:响应于所述第i条控制指令,对所述第i条控制指令对应的待处理数据进行处理,得到处理结果。The processing module is specifically configured to: in response to the i-th control instruction, process the data to be processed corresponding to the i-th control instruction to obtain a processing result. 4.根据权利要求3所述的装置,其特征在于,所述装置还包括数据写回模块;4. The apparatus according to claim 3, wherein the apparatus further comprises a data write-back module; 所述控制模块还用于:响应于所述数据写回模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据写回模块;The control module is further configured to: in response to the completion of the execution of the i-th control instruction by the data write-back module, send the i+1-th control instruction to the data write-back module; 所述数据写回模块用于:响应于所述第i条控制指令,将所述第i条控制指令对应的处理结果写入外部存储模块。The data write-back module is configured to: in response to the i-th control instruction, write the processing result corresponding to the i-th control instruction into an external storage module. 5.根据权利要求3所述的装置,其特征在于,5. The device of claim 3, wherein 当所述处理模块响应于所述第i条控制指令对所述第i条控制指令对应的待处理数据进行处理时,无需等待所述处理模块对所述第i条控制指令执行完毕,所述数据加载模块接收第i+1条控制指令。When the processing module processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, there is no need to wait for the processing module to finish executing the i-th control instruction. The data loading module receives the i+1th control instruction. 6.根据权利要求4所述的装置,其特征在于,6. The device of claim 4, wherein 当所述数据写回模块响应于所述第i条控制指令将所述第i条控制指令对应的处理结果写入外部存储模块时,无需等待所述数据写回模块对所述第i条控制指令执行完毕,所述处理模块接收第i+1条控制指令。When the data write-back module writes the processing result corresponding to the i-th control instruction into the external storage module in response to the i-th control instruction, it does not need to wait for the data-write-back module to control the i-th control instruction. After the execution of the instruction is completed, the processing module receives the i+1 th control instruction. 7.根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存所述控制指令的至少一指令槽;所述控制模块根据至少一指令槽记载的所述控制指令的状态,控制所述控制指令的执行进程。7 . The device according to claim 1 , wherein the control module comprises at least one instruction slot for caching the control instruction; the control module records the state of the control instruction according to the at least one instruction slot. 8 . , to control the execution process of the control instruction. 8.根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存控制指令的至少一指令槽;所述至少一指令槽包括至少一控制指令状态信号,其中,所述控制指令状态信号用于指示对应的指令槽中缓存的控制指令是否有效。8 . The apparatus of claim 1 , wherein the control module includes at least one instruction slot for caching control instructions; the at least one instruction slot includes at least one control instruction status signal, wherein the control The instruction status signal is used to indicate whether the control instruction cached in the corresponding instruction slot is valid. 9.根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存控制指令的至少一指令槽;所述至少一指令槽包括至少一控制指令完成信号;其中,所述至少一控制指令完成信号用于指示所述至少一指令槽对应的模块是否完成所述至少一指令槽对应的控制指令的操作,或者所述至少一控制指令完成信号用于指示所述至少一指令槽是否完成对应的控制指令的操作。9 . The apparatus of claim 1 , wherein the control module includes at least one instruction slot for caching control instructions; the at least one instruction slot includes at least one control instruction completion signal; wherein the at least one instruction slot A control command completion signal is used to indicate whether the module corresponding to the at least one command slot completes the operation of the control command corresponding to the at least one command slot, or the at least one control command completion signal is used to indicate the at least one command slot. Whether to complete the operation of the corresponding control command. 10.根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存所述控制指令的指令槽;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示对应的模块完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效。10 . The device according to claim 1 , wherein the control module includes an instruction slot for buffering the control instruction; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when all When the control instruction completion signal indicates that the corresponding module completes the operation of the corresponding control instruction, the control instruction status signal indicates that the control instruction cached in the corresponding instruction slot is invalid. 11.根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存所述控制指令的指令槽;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示所述至少一指令槽完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效。11. The device according to claim 1, wherein the control module comprises an instruction slot for buffering the control instruction; the instruction slot comprises a control instruction status signal and a control instruction completion signal; When the control instruction completion signal indicates that the at least one instruction slot completes the operation of the corresponding control instruction, the control instruction status signal indicates that the control instruction cached in the corresponding instruction slot is invalid. 12.根据权利要求4所述的装置,其特征在于,所述控制模块包括用于缓存所述控制指令的指令槽;12. The apparatus according to claim 4, wherein the control module comprises an instruction slot for caching the control instruction; 所述控制模块具体用于:响应于所述数据加载模块对第i条控制指令执行完毕,将所述指令槽中缓存的第i+1条控制指令发送给所述数据加载模块;以及响应于所述处理模块对第i条控制指令执行完毕,将所述指令槽中缓存的第i+1条控制指令发送给所述处理模块;以及响应于所述数据写回模块对第i条控制指令执行完毕,将所述指令槽中缓存的第i+1条控制指令发送给所述数据写回模块。The control module is specifically configured to: in response to the data loading module completing the execution of the i-th control instruction, send the i+1-th control instruction cached in the instruction slot to the data loading module; and in response to the data loading module; The processing module finishes executing the i-th control instruction, and sends the i+1-th control instruction cached in the instruction slot to the processing module; and in response to the data write-back module, the i-th control instruction is After the execution is completed, the i+1 th control instruction cached in the instruction slot is sent to the data write-back module. 13.根据权利要求4所述的装置,其特征在于,所述控制模块包括对应于所述数据加载模块的第二指令槽,对应于所述处理模块的第三指令槽以及对应于所述数据写回模块的第四指令槽;13. The apparatus of claim 4, wherein the control module comprises a second instruction slot corresponding to the data loading module, a third instruction slot corresponding to the processing module, and a third instruction slot corresponding to the data Write back the fourth instruction slot of the module; 所述第二指令槽、所述第三指令槽以及所述第四指令槽均用于:在所述控制模块分别在不同时刻将所述第i条控制指令发送完毕之后,分别缓存对应的所述第i+1条控制指令;The second instruction slot, the third instruction slot, and the fourth instruction slot are all used for: after the control module has sent the i-th control instruction at different times, respectively, cache the corresponding the i+1th control instruction; 所述控制模块具体用于:响应于所述数据加载模块对第i条控制指令执行完毕,将所述第二指令槽中缓存的第i+1条控制指令发送给所述数据加载模块;以及响应于所述处理模块对第i条控制指令执行完毕,将所述第三指令槽中缓存的第i+1条控制指令发送给所述处理模块;以及响应于所述数据写回模块对第i条控制指令执行完毕,将所述第四指令槽中缓存的第i+1条控制指令发送给所述数据写回模块。The control module is specifically configured to: in response to the data loading module completing the execution of the i-th control instruction, send the i+1-th control instruction cached in the second instruction slot to the data loading module; and In response to the processing module completing the execution of the i-th control instruction, send the i+1-th control instruction cached in the third instruction slot to the processing module; After the execution of the i control instructions is completed, the i+1 th control instruction cached in the fourth instruction slot is sent to the data write-back module. 14.根据权利要求4所述的装置,其特征在于,所述控制模块包括对应于所述数据加载模块的第二指令槽;14. The apparatus according to claim 4, wherein the control module comprises a second instruction slot corresponding to the data loading module; 所述第二指令槽用于缓存发送至所述数据加载模块的所述控制指令;the second instruction slot is used for buffering the control instruction sent to the data loading module; 所述控制模块还用于:响应于所述第二指令槽中的第i条控制指令无效,将所述第i+1条控制指令缓存至所述第二指令槽;响应于所述数据加载模块对第i条控制指令执行完毕,将所述第二指令槽中的第i+1条控制指令发送至所述数据加载模块。The control module is further configured to: in response to the i-th control instruction in the second instruction slot being invalid, cache the i+1-th control instruction to the second instruction slot; in response to the data loading After the module finishes executing the i-th control instruction, it sends the i+1-th control instruction in the second instruction slot to the data loading module. 15.根据权利要求14所述的装置,其特征在于,所述控制模块还包括对应于所述处理模块的第三指令槽;15. The apparatus according to claim 14, wherein the control module further comprises a third instruction slot corresponding to the processing module; 所述第三指令槽用于缓存发送至所述处理模块的所述控制指令;the third instruction slot is used for buffering the control instruction sent to the processing module; 所述控制模块还用于:响应于所述第三指令槽中的第i条控制指令无效,将所述第二指令槽中的第i+1条控制指令缓存至所述第三指令槽;以及响应于所述处理模块对第i条控制指令执行完毕,将所述第三指令槽中的第i+1条控制指令发送至所述处理模块。The control module is further configured to: in response to the i-th control instruction in the third instruction slot being invalid, cache the i+1-th control instruction in the second instruction slot to the third instruction slot; and in response to the processing module completing the execution of the i-th control instruction, sending the i+1-th control instruction in the third instruction slot to the processing module. 16.根据权利要求15所述的装置,其特征在于,所述控制模块还包括对应于所述数据写回模块的第四指令槽;16. The apparatus according to claim 15, wherein the control module further comprises a fourth instruction slot corresponding to the data write-back module; 所述第四指令槽用于缓存发送至所述数据写回模块的所述控制指令;the fourth instruction slot is used for buffering the control instruction sent to the data write-back module; 所述控制模块还用于:响应于所述第四指令槽中的第i条控制指令无效,将所述第三指令槽中的第i+1条控制指令缓存至所述第四指令槽;以及响应于所述数据写回模块对第i条控制指令执行完毕,将所述第四指令槽中的第i+1条控制指令发送至所述处理模块。The control module is further configured to: in response to the i-th control instruction in the fourth instruction slot being invalid, cache the i+1-th control instruction in the third instruction slot to the fourth instruction slot; and in response to the data write-back module completing the execution of the i-th control instruction, sending the i+1-th control instruction in the fourth instruction slot to the processing module. 17.根据权利要求14所述的装置,其特征在于,所述控制模块还包括第一指令槽;17. The apparatus of claim 14, wherein the control module further comprises a first instruction slot; 所述第一指令槽用于缓存解码后的控制指令;The first instruction slot is used to cache the decoded control instruction; 所述控制模块还用于:对第i条控制指令进行解码;响应于所述第一指令槽中解码后的第i条控制指令无效,将解码后的第i+1条控制指令缓存至第一指令槽;以及响应于所述第二指令槽中解码后的第i条控制指令无效,将所述第一指令槽中解码后的第i+1条控制指令缓存至所述第二指令槽。The control module is also used to: decode the i-th control instruction; in response to the decoded i-th control instruction in the first instruction slot being invalid, cache the decoded i+1-th control instruction to the i-th control instruction. an instruction slot; and in response to the i-th control instruction decoded in the second instruction slot being invalid, buffering the i+1-th control instruction decoded in the first instruction slot to the second instruction slot . 18.根据权利要求2所述的装置,其特征在于,所述控制指令包括结果写回控制指令和结果不写回控制指令;18. The apparatus according to claim 2, wherein the control instruction comprises a result write-back control instruction and a result-not-write-back control instruction; 所述待处理数据为待处理目标数据的其中一部分;所述待处理目标数据被划分为至少两部分;The data to be processed is a part of the target data to be processed; the target data to be processed is divided into at least two parts; 所述结果不写回控制指令用于指示所述处理模块缓存所述处理结果;The result is not written back control instruction for instructing the processing module to cache the processing result; 所述结果写回控制指令用于指示所述处理模块将与所述待处理目标数据相关的所有处理结果发送至所述数据写回模块。The result write-back control instruction is used to instruct the processing module to send all processing results related to the target data to be processed to the data write-back module. 19.根据权利要求18所述的装置,其特征在于,19. The apparatus of claim 18, wherein 所述处理模块具体用于:根据所述结果不写回控制指令,对相应的待处理数据进行处理,得到处理结果并缓存;以及根据所述结果写回控制指令,对相应的待处理数据进行处理,得到处理结果,并将与所述待处理目标数据相关的所有处理结果整合后发送至所述数据写回模块。The processing module is specifically configured to: process the corresponding data to be processed without writing back the control instruction according to the result, obtain the processing result and cache it; and write back the control instruction according to the result, and perform the corresponding processing on the data to be processed. processing, obtaining processing results, and integrating all processing results related to the target data to be processed and sending them to the data write-back module. 20.根据权利要求18所述的装置,其特征在于,20. The apparatus of claim 18, wherein 若所述控制指令为所述结果不写回控制指令,所述控制模块还用于:接收所述处理模块在缓存所述处理结果之后发送的结束信号;以及If the control instruction is the result not written back control instruction, the control module is further configured to: receive an end signal sent by the processing module after buffering the processing result; and 若所述控制指令为所述结果写回控制指令,所述控制模块还用于:接收所述数据写回模块在将所述处理结果写入完成后发送的结束信号;If the control instruction is the result write-back control instruction, the control module is further configured to: receive an end signal sent by the data write-back module after the writing of the processing result is completed; 其中,所述结束信号表征所述结果不写回控制指令或所述结果写回控制指令在所述数据处理装置中执行完毕。Wherein, the end signal indicates that the result is not written back to the control instruction or that the execution of the result write-back control instruction is completed in the data processing device. 21.根据权利要求20所述的装置,其特征在于,21. The apparatus of claim 20, wherein 所述控制模块还用于:按照所述控制指令的接收顺序以及先进先出原则,将所述控制指令对应的结束信号返回给外部控制模块。The control module is further configured to: return the end signal corresponding to the control command to the external control module according to the receiving sequence of the control command and the first-in-first-out principle. 22.根据权利要求21所述的装置,其特征在于,22. The apparatus of claim 21, wherein 所述控制模块还用于:如果按照所述控制指令的接收顺序以及先进先出原则,确定当前接收到的结束信号不是当前要发送的,缓存所述当前接收到的结束信号。The control module is further configured to: buffer the currently received end signal if it is determined that the currently received end signal is not currently to be sent according to the receiving sequence of the control instructions and the FIFO principle. 23.根据权利要求1所述的装置,其特征在于,所述待处理数据包括待处理对象和运行参数。23. The apparatus according to claim 1, wherein the data to be processed comprises objects to be processed and operating parameters. 24.根据权利要求23所述的装置,其特征在于,所述待处理对象包括以下任意一种:图像、音频或文字;24. The device according to claim 23, wherein the object to be processed comprises any one of the following: image, audio or text; 所述运行参数包括以下任意一种:卷积核、池化参数或激活函数。The operating parameters include any one of the following: convolution kernels, pooling parameters or activation functions. 25.根据权利要求23所述的装置,其特征在于,所述数据加载模块包括对象加载单元和参数加载单元;25. The apparatus according to claim 23, wherein the data loading module comprises an object loading unit and a parameter loading unit; 所述控制模块具体用于:响应于所述对象加载单元对第i条控制指令执行完毕,将第i+1条控制指令发送给所述对象加载单元;以及响应于所述参数加载单元对第i条控制指令执行完毕,将第i+1条控制指令发送给所述参数加载单元;The control module is specifically configured to: in response to the completion of the execution of the i-th control instruction by the object loading unit, send the i+1-th control instruction to the object loading unit; After the execution of the i control instructions is completed, the i+1th control instruction is sent to the parameter loading unit; 所述对象加载单元用于:响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理对象;The object loading unit is configured to: in response to the i-th control instruction, load the object to be processed corresponding to the i-th control instruction; 所述参数加载单元用于:响应于所述第i条控制指令,加载所述第i条控制指令对应的运行参数。The parameter loading unit is configured to: in response to the i-th control instruction, load the operating parameters corresponding to the i-th control instruction. 26.根据权利要求1所述的装置,其特征在于,所述处理模块包括脉动阵列;26. The apparatus of claim 1, wherein the processing module comprises a systolic array; 所述处理模块具体用于:响应于所述第i条控制指令,将所述第i条控制指令对应的待处理数据写入脉动阵列中,通过所述脉动阵列对所述待处理数据进行运算,得到所述处理结果。The processing module is specifically configured to: in response to the i-th control instruction, write the to-be-processed data corresponding to the i-th control instruction into a systolic array, and perform operations on the to-be-processed data through the systolic array to obtain the processing result. 27.根据权利要求23所述的装置,其特征在于,所述处理模块包括脉动阵列;27. The apparatus of claim 23, wherein the processing module comprises a systolic array; 所述处理模块具体用于:响应于所述第i条控制指令,将所述第i条控制指令对应的待处理对象和运行参数分别写入所述脉动阵列中,通过所述脉动阵列将所述待处理对象与所述运行参数进行运算,得到所述处理结果。The processing module is specifically configured to: in response to the i-th control instruction, write the object to be processed and the operating parameters corresponding to the i-th control instruction into the systolic array, respectively; The to-be-processed object is operated on with the operating parameters to obtain the processing result. 28.根据权利要求1所述的装置,其特征在于,28. The apparatus of claim 1, wherein 所述控制模块包括用于缓存所述控制指令的指令槽;所述指令槽包括指令缓存标志和控制状态信号的集合;The control module includes an instruction slot for caching the control instruction; the instruction slot includes a set of instruction cache flags and control status signals; 其中,所述指令缓存标志用于指示在所述指令槽中缓存的指令是否有效;所述控制状态信号的集合用于表示对应模块的工作状态,或者表示与所述指令槽相关的其他指令槽的工作状态。The instruction cache flag is used to indicate whether the instruction cached in the instruction slot is valid; the set of control status signals is used to indicate the working status of the corresponding module, or other instruction slots related to the instruction slot. working status. 29.一种数据处理方法,其特征在于,应用于数据处理装置上,所述数据处理装置包括数据加载模块和处理模块;所述方法包括:29. A data processing method, characterized in that it is applied to a data processing device, the data processing device comprising a data loading module and a processing module; the method comprises: 响应于控制指令,通过所述数据加载模块加载数据以供所述处理模块进行数据处理;以及,in response to a control instruction, loading data by the data loading module for data processing by the processing module; and, 响应于所述控制指令,通过所述处理模块进行数据处理;其中,所述数据加载模块和所述处理模块在同一时刻执行不同的控制指令。In response to the control instruction, data processing is performed by the processing module; wherein, the data loading module and the processing module execute different control instructions at the same time. 30.根据权利要求29所述的方法,其特征在于,所述方法还包括:30. The method of claim 29, wherein the method further comprises: 响应于所述控制指令,将所述待处理数据的处理结果写入外部存储模块;其中,所述数据加载模块、所述处理模块和所述数据写回模块在同一时刻执行不同的控制指令。In response to the control instruction, the processing result of the data to be processed is written into an external storage module; wherein, the data loading module, the processing module and the data writing module execute different control instructions at the same time. 31.根据权利要求29所述的方法,其特征在于,所述方法还包括:31. The method of claim 29, wherein the method further comprises: 响应于所述数据加载模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块;以及响应于所述处理模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块;其中,i为整数;In response to the data loading module completing the execution of the i-th control instruction, send the i+1-th control instruction to the data loading module; and in response to the processing module completing the execution of the i-th control instruction, send the i-th control instruction i+1 control instructions are sent to the processing module; wherein, i is an integer; 所述响应于控制指令,通过所述数据加载模块加载数据以供所述处理模块进行数据处理,包括:The loading of data by the data loading module in response to the control instruction for the processing module to perform data processing includes: 响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理数据;In response to the i-th control instruction, load the data to be processed corresponding to the i-th control instruction; 所述响应于所述控制指令,通过所述处理模块进行数据处理,包括:The data processing performed by the processing module in response to the control instruction includes: 响应于所述第i条控制指令,对所述第i条控制指令对应的待处理数据进行处理,得到处理结果。In response to the i-th control instruction, the data to be processed corresponding to the i-th control instruction is processed to obtain a processing result. 32.根据权利要求31所述的方法,其特征在于,所述装置还包括数据写回模块;32. The method of claim 31, wherein the apparatus further comprises a data write-back module; 所述方法还包括:The method also includes: 响应于所述数据写回模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据写回模块;In response to the data write-back module completing the execution of the i-th control instruction, sending the i+1-th control instruction to the data write-back module; 响应于所述第i条控制指令,通过所述数据写回模块将所述第i条控制指令对应的处理结果写入外部存储模块。In response to the i-th control instruction, the processing result corresponding to the i-th control instruction is written into an external storage module through the data write-back module. 33.根据权利要求31所述的方法,其特征在于,还包括:33. The method of claim 31, further comprising: 当所述处理模块响应于所述第i条控制指令对所述第i条控制指令对应的待处理数据进行处理时,无需等待所述处理模块对所述第i条控制指令执行完毕,所述数据加载模块接收第i+1条控制指令。When the processing module processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, there is no need to wait for the processing module to finish executing the i-th control instruction. The data loading module receives the i+1th control instruction. 34.根据权利要求32所述的方法,其特征在于,还包括:34. The method of claim 32, further comprising: 当所述数据写回模块响应于所述第i条控制指令将所述第i条控制指令对应的处理结果写入外部存储模块时,无需等待所述数据写回模块对所述第i条控制指令执行完毕,所述处理模块接收第i+1条控制指令。When the data write-back module writes the processing result corresponding to the i-th control instruction into the external storage module in response to the i-th control instruction, it does not need to wait for the data-write-back module to control the i-th control instruction. After the execution of the instruction is completed, the processing module receives the i+1 th control instruction. 35.根据权利要求29所述的方法,其特征在于,还包括:35. The method of claim 29, further comprising: 通过至少一指令槽缓存所述控制指令,以及根据至少一指令槽记载的所述控制指令的状态,控制所述控制指令的执行进程。The control instruction is cached by at least one instruction slot, and the execution process of the control instruction is controlled according to the state of the control instruction recorded in at least one instruction slot. 36.根据权利要求29所述的方法,其特征在于,还包括:36. The method of claim 29, further comprising: 通过至少一指令槽缓存所述控制指令;所述至少一指令槽包括至少一控制指令状态信号,其中,所述控制指令状态信号用于指示对应的指令槽中缓存的控制指令是否有效。The control instruction is cached through at least one instruction slot; the at least one instruction slot includes at least one control instruction status signal, wherein the control instruction status signal is used to indicate whether the control instruction cached in the corresponding instruction slot is valid. 37.根据权利要求29所述的方法,其特征在于,还包括:37. The method of claim 29, further comprising: 通过至少一指令槽缓存所述控制指令;所述至少一指令槽包括至少一控制指令完成信号;其中,所述至少一控制指令完成信号用于指示所述至少一指令槽对应的模块是否完成所述至少一指令槽对应的控制指令的操作,或者所述至少一控制指令完成信号用于指示所述至少一指令槽是否完成对应的控制指令的操作。The control instruction is cached by at least one instruction slot; the at least one instruction slot includes at least one control instruction completion signal; wherein, the at least one control instruction completion signal is used to indicate whether the module corresponding to the at least one instruction slot has completed the The operation of the control instruction corresponding to the at least one instruction slot, or the at least one control instruction completion signal is used to indicate whether the at least one instruction slot completes the operation of the corresponding control instruction. 38.根据权利要求29所述的方法,其特征在于,还包括:38. The method of claim 29, further comprising: 通过至少一指令槽缓存所述控制指令;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示对应的模块完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效。The control instruction is cached through at least one instruction slot; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the corresponding module to complete the operation of the corresponding control instruction, the The control instruction status signal indicates that the control instruction cached in the corresponding instruction slot is invalid. 39.根据权利要求29所述的方法,其特征在于,还包括:39. The method of claim 29, further comprising: 通过至少一指令槽缓存所述控制指令;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示所述至少一指令槽完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效。The control instruction is cached through at least one instruction slot; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the at least one instruction slot to complete the operation of the corresponding control instruction , the control instruction status signal indicates that the control instruction cached in the corresponding instruction slot is invalid. 40.根据权利要求31所述的方法,其特征在于,还包括:40. The method of claim 31, further comprising: 通过至少一指令槽缓存所述控制指令;其中,发送至所述数据加载模块、所述处理模块以及所述数据写回模块的第i+1条控制指令从所述指令槽中获取。The control instruction is cached by at least one instruction slot; wherein, the i+1 th control instruction sent to the data loading module, the processing module and the data writing module is obtained from the instruction slot. 41.根据权利要求31所述的方法,其特征在于,还包括:41. The method of claim 31, further comprising: 通过第二指令槽缓存对应于所述数据加载模块的控制指令,以及通过第三指令槽缓存对应于所述处理模块的控制指令,以及通过第三指令槽缓存对应于所述数据写回模块的控制指令。Control instructions corresponding to the data loading module are cached through the second instruction slot, control instructions corresponding to the processing module are cached through the third instruction slot, and control instructions corresponding to the data write-back module are cached through the third instruction slot. Control instruction. 42.根据权利要求32所述的方法,其特征在于,还包括:42. The method of claim 32, further comprising: 响应于所述第二指令槽中的第i条控制指令无效,将所述第i+1条控制指令缓存至所述第二指令槽;In response to the i-th control instruction in the second instruction slot being invalid, caching the i+1-th control instruction to the second instruction slot; 所述响应于所述数据加载模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块,包括:Sending the i+1th control instruction to the data loading module in response to the completion of the execution of the i-th control instruction by the data loading module includes: 响应于所述数据加载模块对第i条控制指令执行完毕,将所述第二指令槽中的第i+1条控制指令发送至所述数据加载模块。In response to the data loading module completing the execution of the i-th control instruction, the i+1-th control instruction in the second instruction slot is sent to the data loading module. 43.根据权利要求42所述的方法,其特征在于,还包括:43. The method of claim 42, further comprising: 响应于所述第三指令槽中的第i条控制指令无效,将所述第二指令槽中的第i+1条控制指令缓存至所述第三指令槽;In response to the i-th control instruction in the third instruction slot being invalid, caching the i+1-th control instruction in the second instruction slot to the third instruction slot; 所述响应于所述处理模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块,包括:The response to the processing module completing the execution of the i-th control instruction, sending the i+1-th control instruction to the processing module, including: 响应于所述处理模块对第i条控制指令执行完毕,将所述第三指令槽中的第i+1条控制指令发送至所述处理模块。In response to the processing module completing the execution of the i-th control instruction, the i+1-th control instruction in the third instruction slot is sent to the processing module. 44.根据权利要求43所述的方法,其特征在于,还包括:44. The method of claim 43, further comprising: 响应于所述第四指令槽中的第i条控制指令无效,将所述第三指令槽中的第i+1条控制指令缓存至所述第四指令槽;In response to the i-th control instruction in the fourth instruction slot being invalid, caching the i+1-th control instruction in the third instruction slot to the fourth instruction slot; 所述响应于所述数据写回模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据写回模块,包括:Sending the i+1th control instruction to the data write-back module in response to the completion of the execution of the i-th control instruction by the data write-back module includes: 响应于所述数据写回模块对第i条控制指令执行完毕,将所述第四指令槽中的第i+1条控制指令发送至所述处理模块。In response to the completion of executing the i-th control instruction by the data write-back module, the i+1-th control instruction in the fourth instruction slot is sent to the processing module. 45.根据权利要求42所述的方法,其特征在于,还包括:45. The method of claim 42, further comprising: 对第i条控制指令进行解码;响应于所述第一指令槽中解码后的第i条控制指令无效,将解码后的第i+1条控制指令缓存至第一指令槽;以及,Decoding the i-th control instruction; in response to the decoded i-th control instruction in the first instruction slot being invalid, buffering the decoded i+1-th control instruction to the first instruction slot; and, 响应于所述第二指令槽中解码后的第i条控制指令无效,将所述第一指令槽中解码后的第i+1条控制指令缓存至所述第二指令槽。In response to the i-th control instruction decoded in the second instruction slot being invalid, the i+1-th control instruction decoded in the first instruction slot is cached to the second instruction slot. 46.根据权利要求30所述的方法,其特征在于,所述控制指令包括结果写回控制指令和结果不写回控制指令;46. The method of claim 30, wherein the control instructions include a result write back control instruction and a result no write back control instruction; 所述待处理数据为待处理目标数据的其中一部分;所述待处理目标数据被划分为至少两部分;The data to be processed is a part of the target data to be processed; the target data to be processed is divided into at least two parts; 所述结果不写回控制指令用于指示所述处理模块缓存所述处理结果;The result is not written back control instruction for instructing the processing module to cache the processing result; 所述结果写回控制指令用于指示所述处理模块将与所述待处理目标数据相关的所有处理结果发送至所述数据写回模块。The result write-back control instruction is used to instruct the processing module to send all processing results related to the target data to be processed to the data write-back module. 47.根据权利要求46所述的方法,其特征在于,所述响应于所述控制指令,通过所述处理模块进行数据处理,包括:47. The method according to claim 46, wherein the data processing performed by the processing module in response to the control instruction comprises: 根据所述结果不写回控制指令,对相应的待处理数据进行处理,得到处理结果并缓存;以及,According to the result, the control instruction is not written back, the corresponding data to be processed is processed, the processing result is obtained and cached; and, 根据所述结果写回控制指令,对相应的待处理数据进行处理,得到处理结果,并将与所述待处理目标数据相关的所有处理结果整合后发送至所述数据写回模块。According to the result write-back control instruction, the corresponding data to be processed is processed to obtain a processing result, and all processing results related to the target data to be processed are integrated and sent to the data write-back module. 48.根据权利要求47所述的方法,其特征在于,还包括:48. The method of claim 47, further comprising: 若所述控制指令为所述结果不写回控制指令,接收所述处理模块在缓存所述处理结果之后发送的结束信号;以及,If the control instruction is that the result is not written back to the control instruction, receiving an end signal sent by the processing module after buffering the processing result; and, 若所述控制指令为所述结果写回控制指令,接收所述数据写回模块在将所述处理结果写入完成后发送的结束信号;其中,所述结束信号表征所述结果不写回控制指令或所述结果写回控制指令在所述数据处理装置中执行完毕。If the control command is the result write-back control command, receive an end signal sent by the data write-back module after writing the processing result is completed; wherein, the end signal indicates that the result is not written back to the control The instruction or the result write-back control instruction is executed in the data processing device. 49.根据权利要求48所述的方法,其特征在于,还包括:49. The method of claim 48, further comprising: 按照所述控制指令的接收顺序以及先进先出原则,将所述控制指令对应的结束信号返回给外部控制模块。According to the receiving sequence of the control commands and the FIFO principle, the end signal corresponding to the control commands is returned to the external control module. 50.根据权利要求49所述的方法,其特征在于,还包括:50. The method of claim 49, further comprising: 如果按照所述控制指令的接收顺序以及先进先出原则,确定当前接收到的结束信号不是当前要发送的,缓存所述当前接收到的结束信号。If it is determined that the currently received end signal is not currently to be sent according to the receiving sequence of the control instructions and the FIFO principle, the currently received end signal is buffered. 51.根据权利要求29所述的方法,其特征在于,所述待处理数据包括待处理对象和运行参数。51. The method of claim 29, wherein the data to be processed includes objects to be processed and operating parameters. 52.根据权利要求51所述的方法,其特征在于,所述待处理对象包括以下任意一种:图像、音频或文字;52. The method according to claim 51, wherein the object to be processed comprises any one of the following: image, audio or text; 所述运行参数包括以下任意一种:卷积核、池化参数或激活函数。The operating parameters include any one of the following: convolution kernels, pooling parameters or activation functions. 53.根据权利要求51所述的方法,其特征在于,所述数据加载模块包括对象加载单元和参数加载单元;53. The method according to claim 51, wherein the data loading module comprises an object loading unit and a parameter loading unit; 所述响应于控制指令,通过所述数据加载模块加载数据以供所述处理模块进行数据处理,包括:The loading of data by the data loading module in response to the control instruction for the processing module to perform data processing includes: 响应于所述第i条控制指令,通过所述对象加载单元加载所述第i条控制指令对应的待处理对象;以及,In response to the i-th control instruction, the object to be processed corresponding to the i-th control instruction is loaded by the object loading unit; and, 响应于所述第i条控制指令,通过所述参数加载单元加载所述第i条控制指令对应的运行参数。In response to the i-th control instruction, the parameter loading unit loads the operation parameters corresponding to the i-th control instruction. 54.根据权利要求29所述的方法,其特征在于,所述响应于所述控制指令,通过所述处理模块进行数据处理,包括:54. The method according to claim 29, wherein the data processing performed by the processing module in response to the control instruction comprises: 响应于所述第i条控制指令,将所述第i条控制指令对应的待处理数据写入脉动阵列中,通过所述脉动阵列对所述待处理数据进行运算,得到处理结果。In response to the i-th control instruction, write the to-be-processed data corresponding to the i-th control instruction into a systolic array, and perform operations on the to-be-processed data through the systolic array to obtain a processing result. 55.根据权利要求51所述的方法,其特征在于,所述响应于所述控制指令,通过所述处理模块进行数据处理,包括:55. The method according to claim 51, wherein the data processing performed by the processing module in response to the control instruction comprises: 响应于所述第i条控制指令,将所述第i条控制指令对应的待处理对象和运行参数分别写入所述脉动阵列中,通过所述脉动阵列将所述待处理对象与所述运行参数进行运算,得到处理结果。In response to the i-th control instruction, write the object to be processed and the operation parameters corresponding to the i-th control instruction into the systolic array, respectively, and use the systolic array to associate the object to be processed with the operation parameter. The parameters are operated to obtain the processing result. 56.根据权利要求29所述的装置,其特征在于,56. The device of claim 29, wherein 所述控制模块包括用于缓存所述控制指令的指令槽;所述指令槽包括指令缓存标志和控制状态信号的集合;The control module includes an instruction slot for caching the control instruction; the instruction slot includes a set of instruction cache flags and control status signals; 其中,所述指令缓存标志用于指示在所述指令槽中缓存的指令是否有效;所述控制状态信号的集合用于表示对应模块的工作状态,或者表示与所述指令槽相关的其他指令槽的工作状态。The instruction cache flag is used to indicate whether the instruction cached in the instruction slot is valid; the set of control status signals is used to indicate the working status of the corresponding module, or other instruction slots related to the instruction slot. working status. 57.一种加速器,其特征在于,包括如权利要求1至28任意一项所述的数据处理装置。57. An accelerator, characterized by comprising the data processing device according to any one of claims 1 to 28.
CN202080004332.0A 2020-03-11 2020-03-11 Data processing apparatus, data processing method, and accelerator Pending CN112602094A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/078876 WO2021179224A1 (en) 2020-03-11 2020-03-11 Data processing device, data processing method and accelerator

Publications (1)

Publication Number Publication Date
CN112602094A true CN112602094A (en) 2021-04-02

Family

ID=75208096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080004332.0A Pending CN112602094A (en) 2020-03-11 2020-03-11 Data processing apparatus, data processing method, and accelerator

Country Status (2)

Country Link
CN (1) CN112602094A (en)
WO (1) WO2021179224A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1433538A (en) * 1999-12-03 2003-07-30 英特尔公司 Method and apparatus for constructing a pre-scheduled instruction cache
CN107341547A (en) * 2016-04-29 2017-11-10 北京中科寒武纪科技有限公司 An apparatus and method for performing convolutional neural network training
CN107590535A (en) * 2017-09-08 2018-01-16 西安电子科技大学 Programmable neural network processor
CN109937416A (en) * 2017-05-17 2019-06-25 谷歌有限责任公司 low delay matrix multiplication component
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Implementation method, system and device of convolutional neural network based on FPGA and line output priority
CN110333827A (en) * 2019-07-11 2019-10-15 山东浪潮人工智能研究院有限公司 A data loading device and data loading method
CN110659070A (en) * 2018-06-29 2020-01-07 赛灵思公司 High-parallelism computing system and instruction scheduling method thereof
CN110780921A (en) * 2019-08-30 2020-02-11 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6467743B2 (en) * 2013-08-19 2019-02-13 シャンハイ シンハオ マイクロエレクトロニクス カンパニー リミテッド High performance processor system based on general purpose unit and its method
CN111095294A (en) * 2017-07-05 2020-05-01 深视有限公司 Depth Vision Processor
CN108475347A (en) * 2017-11-30 2018-08-31 深圳市大疆创新科技有限公司 Method, apparatus, accelerator, system and the movable equipment of Processing with Neural Network
US10963379B2 (en) * 2018-01-30 2021-03-30 Microsoft Technology Licensing, Llc Coupling wide memory interface to wide write back paths

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1433538A (en) * 1999-12-03 2003-07-30 英特尔公司 Method and apparatus for constructing a pre-scheduled instruction cache
CN107341547A (en) * 2016-04-29 2017-11-10 北京中科寒武纪科技有限公司 An apparatus and method for performing convolutional neural network training
CN109937416A (en) * 2017-05-17 2019-06-25 谷歌有限责任公司 low delay matrix multiplication component
CN107590535A (en) * 2017-09-08 2018-01-16 西安电子科技大学 Programmable neural network processor
CN110659070A (en) * 2018-06-29 2020-01-07 赛灵思公司 High-parallelism computing system and instruction scheduling method thereof
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Implementation method, system and device of convolutional neural network based on FPGA and line output priority
CN110333827A (en) * 2019-07-11 2019-10-15 山东浪潮人工智能研究院有限公司 A data loading device and data loading method
CN110780921A (en) * 2019-08-30 2020-02-11 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic device

Also Published As

Publication number Publication date
WO2021179224A1 (en) 2021-09-16

Similar Documents

Publication Publication Date Title
CN111860812B (en) A device and method for performing convolutional neural network training
EP3451162B1 (en) Device and method for use in executing matrix multiplication operations
CN111008040B (en) Cache device and cache method, computing device and computing method
CN109284825B (en) Apparatus and method for performing LSTM operations
CN107766079B (en) Processor and method for executing instructions on processor
US11144330B2 (en) Algorithm program loading method and related apparatus
CN111143272A (en) Data processing method, device and readable storage medium for heterogeneous computing platform
CN112214443B (en) Secondary unloading device and method arranged in graphic processor
KR20220045026A (en) Hardware circuitry for accelerating neural network computations
US20130328898A1 (en) Render Tree Caching
US12399720B2 (en) Apparatus and method
CN111190741A (en) Scheduling method, device and storage medium based on deep learning node calculation
CN110825514A (en) Artificial intelligence chip and instruction execution method for artificial intelligence chip
CN115023685A (en) Accelerators for dense and sparse matrix computations
US20190272460A1 (en) Configurable neural network processor for machine learning workloads
CN111651202A (en) Device for executing vector logic operation
US11436486B2 (en) Neural network internal data fast access memory buffer
CN114327639B (en) Accelerator based on data flow architecture, data access method and equipment of accelerator
US12354181B2 (en) Graphics processing unit including delegator and operating method thereof
CN110659119A (en) Picture processing method, device and system
CN112602094A (en) Data processing apparatus, data processing method, and accelerator
KR20140131781A (en) Memory control apparatus and method
US7594080B2 (en) Temporary storage of memory line while waiting for cache eviction
CN120631833A (en) Storage and computing integrated chip, instruction scheduling method and related devices
US10366049B2 (en) Processor and method of controlling the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210402

WD01 Invention patent application deemed withdrawn after publication