Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
In order to keep the following description of the embodiments of the present disclosure clear and concise, the present disclosure omits a detailed description of some known functions and known components.
Within modern processing cores are typically a number of pipeline stages such as branch prediction (Branch prediction), instruction fetch (Instruction fetch), instruction Decode (Decode), instruction dispatch and rename (DISPATCH AND RENAME), instruction execution (execution), retire, and the like. To support high operating frequencies, each pipeline stage may in turn comprise a plurality of pipeline stages. An important characteristic of SMT is that in the same clock cycle, instructions in the same instruction execution pipeline stage can come from multiple threads, while in other pipeline stages, one clock often only selects and processes instructions of one thread. Thus, in these phases, it is necessary to select one from a plurality of threads to pass on to the next pipeline phase, which is called thread scheduling. The selection of thread scheduling has important effects on the overall performance of SMT, power consumption and fairness among threads.
SMT may be referred to as SMT2 (at most two active threads), SMT4 (at most four active threads), etc. depending on the number of maximum active threads supported.
The SMT internal hardware resources are allocated in different ways. The usual ways are:
1. All hardware resources are equally divided according to the maximum number of active threads supported by SMT.
2. Full dynamic sharing-all hardware resources are dynamically shared by all threads.
3. Mixed mode-some hardware resources are dynamically shared by all threads, while other resources are statically partitioned.
4. In other modes, a plurality of threads can be divided into a plurality of thread groups, all resources are divided in a full static mode among different thread groups, but resources in one thread group are shared in a full dynamic mode.
In the process of compiling a program, by specifying an optimization level, a compiler can perform performance optimization with different aggressive degrees, including but not limited to code reordering, code vectorization and other compiling technologies.
FIG. 1 is a schematic diagram of a characteristic optimization guideline.
For example, a pass of the program may be run, through data collection, to analyze whether the program has some type of behavioral preference, and if so, the subsequent compilers further optimize according to the behavioral preference.
Taking the following codes as examples:
code A;
if(a==0)//code B;
a=a+1;//code C;
code D;
If the code exists, at the time of static compiling by the compiler, there is no way to judge whether the variable a will be equal to 0 at the time of actual running, so the compiler chooses to follow the writing of the high-level programming language, as shown in the left-hand flow of fig. 1, translates the branch instruction (code B) as originally and places it behind code a. At this time, the program semantics are that if the branch code B is established, the code C is skipped, the code D is executed, and if the branch code B is not established, the code C is executed in sequence without jumping. That is, the compiler may arrange codes a, B, C, D in the memory space in succession, following the syntactic requirements of the high-level language itself.
In fig. 1, a broken line indicates a rarely occurring instruction control flow, and a solid line indicates an instruction control flow in most cases.
Assuming that the probability that the condition that a is equal to 0 is satisfied is very low (the probability is lower than 1%), such code arrangement space causes that at the time of code B, the program often needs to take branch jump, so that the CPU needs to interrupt the pipeline, and re-fetch and decode the program stream with the branch. Code C is fetched when the CPU considers that no branch jump occurs, after the branch jump is found, the CPU needs to drain the code C and the code after the code C from the pipeline, and simultaneously fetch the instruction from the address of code D again, which can seriously affect the performance of the CPU.
After this occurs by collecting the branching code B when dynamically executing the section of program, the compiler chooses to place the code of code C not behind code B but in a non-contiguous address space, as shown on the right side of fig. 1, by the feature optimization guidance. At this time, the arrangement of the codes a, B, D in the memory space is continuous, and the code C is stored in the memory space of the other block and the codes a, B, D are discontinuous. Thus, the program is executed in the order of code A- > code B- > code D most of the time, and the serious influence on the CPU performance is not caused by the occurrence of branch jumps (the branch jumps comprise code B- > code D on the left side of FIG. 1 and code B- > code C- > code D on the right side of FIG. 1).
However, the above compiling optimization technology only performs static analysis on the program code, does not care whether the program has certain preference in specific execution, has the same optimizing logic for executing the program with all characteristics, cannot perform targeted optimization on different characteristics of each program, and has limitation in compiling optimization.
In addition, when the processing core uses SMT technology, since a thread is to share hardware resources of the processing core with other threads, performance of one thread running in SMT is often lower than performance of the thread in a single-thread mode, and resource allocation is performed according to the above four modes, so that targeted adjustment cannot be performed for characteristics of a program.
For example, taking SMT2 as an example, a processing core performs tasks using 2 threads (thread 0 and thread 1) simultaneously. The cache is a dynamically shared resource, and thread 0 and thread 1 may share all of the memory space in the cache. Thread 0 is, for example, a streaming media application, a large amount of streaming media data is often used only once, and is not used any more in the future, and thread 0 is likely to occupy a large amount of cache, so that data of thread 1 is kicked out of the cache, and the performance of thread 1 is often stopped due to cache miss and is reduced linearly. Meanwhile, although thread 0 occupies a large amount of cache capacity, since these data are all disposable data, the cache capacity is not used but the processor performance cannot be improved.
For example, branch instructions have the ability to interrupt the instruction stream, and if the branch instruction is not processed well, the performance of the processor may be greatly impacted. Branch prediction techniques refer to a processor that does not wait until the branch result is fetched when it encounters a branch instruction, but instead predicts the branch "jump" or "no jump" and jump target address directly during the fetch stage, in order to implement an uninterrupted instruction stream based on the predicted result, thereby reducing the CPI (Cycle Per Instruction number of clock cycles required to execute an instruction) of the processor. The processor may use a branch target table to store prediction information for branches, with each branch target entry in the branch target table storing information about a branch jump, such as a jump direction and a jump target address.
The branch target table is a dynamically shared resource, and assuming that there are many branches in thread 0 that are difficult to predict accurately (e.g., branches that jump indirectly, such as jmp [ mem ], the jump target address mem depends on the value written specifically into mem by the previous program), thread 0 may use many branch target entries, but it cannot increase the branch prediction accuracy of thread 0, and at the same time, thread 1 cannot make a branch prediction when a subsequent branch comes due to the occupied branch target entry being kicked out by thread 0, so that performance is degraded.
For example, for statically partitioned resources, each thread is typically pre-aliquoted and used exclusively. It is highly likely that for some threads, that much of the resource will not be used in the actual scenario, or that even occasionally because it is insufficient to cause pipeline stalls in the processing core, it will not greatly impact performance, while another thread may be in urgent need of that resource. At this time, if the resources are divided in an equally divided manner, performance is impaired.
For example, in some cases, thread 0 may need to wait for resource B to continue execution even if it contends for some dynamically shared resource a, potentially resulting in another thread 1 stalling due to the lack of resource a. For example, thread 0 contends for resource a, but requires the value returned by resource B to perform an operation on resource a, so that thread 0 occupies a significant amount of resource a before resource B returns the value, but without any improvement in performance.
At least one embodiment of the present disclosure provides an information processing method, a resource allocation method, an information processing apparatus, a resource allocation apparatus, an electronic device, and a non-transitory computer-readable storage medium.
The information processing method provided by at least one embodiment of the present disclosure includes collecting performance data generated by a processor in a task execution process, where the processor executes a task by running multiple threads at the same time, the performance data indicates a degree and/or a time of a need for different hardware resources in the processor by each of the multiple threads, generating compiling instruction information corresponding to the task according to the performance data, and inserting the compiling instruction information into a compiler of the task to obtain an optimal compiler corresponding to the task, where the compiling instruction information is used for allocating the hardware resources of the processor when the processor runs the optimal compiler.
According to the information processing method provided by at least one embodiment of the present disclosure, customized compiling optimization can be performed according to characteristics of a program running different tasks, and compiling instruction information for processor resource allocation during running of the program is obtained by collecting performance data generated by the running program, so that a reference for resource allocation is provided for running of the program by a processor, and hardware resources used by the threads can be controlled or divided in a targeted manner according to the characteristics of the program, so that the resource allocation is more reasonable, and the performance of the processor is effectively improved.
The method for allocating the resources further comprises the step of allocating the hardware resources of the processor used when the processor executes the tasks according to the compiling instruction information, wherein the compiling instruction information is generated according to performance data generated by the processor in the process of executing the compiling program corresponding to the tasks, the processor simultaneously runs a plurality of threads to execute the tasks, and the performance data indicates the requirement degree and/or the requirement time of the plurality of threads on different hardware resources in the processor.
According to the information processing method provided by at least one embodiment of the present disclosure, hardware resources can be allocated by referring to compiling instruction information obtained according to collected performance data, so that hardware resources used for a line can be controlled or divided in a targeted manner according to characteristics of a program itself, so that resource allocation is more reasonable, and performance of a processor is effectively improved.
Embodiments of the present disclosure will be described in detail below with reference to the attached drawings, but the present disclosure is not limited to these specific embodiments.
Fig. 2 is a schematic flow chart of an information processing method according to at least one embodiment of the present disclosure. For example, as shown in fig. 2, the data processing method provided in the embodiment of the present disclosure at least includes steps S10 to S30.
For example, the information processing method is applied to a processor, such as a processor using SMT technology.
For example, the processor includes at least one processing core that is a physical core, and multiple threads in one processing core that execute tasks using (or running) multiple threads are multiple logical cores that can be understood to be virtual in one physical core.
For example, the number of threads b may be equal to or less than the maximum number of active threads supported by one physical core, b being a positive integer greater than 1.
In step S10, performance data generated by the processor in the course of executing the task is collected.
For example, a processor may execute tasks using multiple threads simultaneously, e.g., the processor may execute tasks using concurrent multithreading, where the multiple threads reside on the same physical core and the multiple threads share hardware resources in the physical core, such sharing including static sharing and static sharing.
For example, the hardware resources include computing resources and storage resources in the processor. For example, the computing resources include hardware resources for performing computation, such as a computing Logic Unit (ARITHMETIC AND Logic Unit, abbreviated as ALU), an address computing Unit (Address Generation Unit, abbreviated as AGU), and the like. For example, storage resources include hardware resources such as caches, memory, queues, registers, etc. for caching data or instructions associated with the pipeline.
For example, in some examples, a hardware resource may refer to all resources in a processor that can be used for computation or storage.
For example, a hardware resource may also refer to a portion of a computing or storage resource in a processor, e.g., analysis of collected performance data of multiple tasks, where certain hardware resources are found to be sufficient for most tasks, or where the absence does not significantly affect the performance of the processor, where the hardware resources do not require resource allocation optimization, or where the performance of the processor cannot be improved, the hardware resources in this disclosure may include other hardware resources in the processor in addition to those hardware resources.
For example, the performance data may be collected by executing a compiler corresponding to the task, e.g., the compiler compiles the program corresponding to the task based on a conventional optimization rule or general optimization logic, and the compiler may be a compiler that has not been compiled and optimized, or has been subjected to some optimization such as code reordering after static analysis of the code itself.
For example, tasks may be deployed on a processor using SMT technology, running a compiler for a period of time, and collecting performance data during the task's running.
For example, the performance data may indicate how much different hardware resources are required in the processor by the multiple threads. For example, by analyzing the performance data, the extent of the need for individual hardware resources by different threads can be determined.
For example, for thread 0 included in multiple threads, if a certain hardware resource is missing, it will cause the thread 0 to be interrupted or stalled, where the interrupt or stall has a relatively large impact on the performance of the processor, the demand of the thread 0 for the hardware resource is relatively high, and this hardware resource is more critical for the thread 0. Conversely, if a hardware resource is missing, thread 0 will not be interrupted or stalled, or even if an interrupt or stall occurs that does not have a significant impact on the performance of the processor, thread 0 will have a lower demand for that hardware resource, and this hardware resource will be less critical to thread 0.
For example, the performance data may indicate the timing of the need for different hardware resources in the processor by the multiple threads. For example, by analyzing the performance data, the opportunities that different threads need to occupy for each hardware resource can be obtained.
In step S20, compiling instruction information corresponding to the task is generated based on the performance data.
In step S30, compiling instruction information is inserted into the compiler of the task, and an optimized compiler corresponding to the task is obtained.
For example, the compile indication information is used to allocate hardware resources of the processor when the processor runs an optimization compiler.
For example, the compiling instruction information corresponds to the tasks one by one, different programs executing different tasks have different compiling instruction information, and the compiling instruction information provides references for hardware resource allocation for the processor according to the program characteristics of the corresponding tasks.
For example, the compiling instruction information may directly include performance data, that is, the performance data may be provided as the compiling instruction information to the processor, and the processor may analyze the performance data in the compiling instruction information to determine allocation of hardware resources to the processor when running the optimizing compiler. In this embodiment, the process of analyzing the performance data by the processor is similar to the process of generating the compiling instruction information as the analysis result described later, and will not be described here.
For example, some analysis and processing can be performed on the performance data to obtain compiling instruction information, the compiling instruction information provides analysis results of the degree of demands or the key degree of different hardware resources by some threads, and the processor can directly allocate the resources according to the compiling instruction information, so that analysis time is saved, and data transmission quantity is reduced.
For example, compiling instruction information can be inserted into the head part of the compiler or other appointed positions to obtain an optimized compiler, and the optimized compiler can perform targeted compiling optimization according to the characteristics of the task. Different from the compiling optimization according to the general scene or the static code analysis, the optimizing compiling program can carry out targeted compiling optimization according to the characteristics of the program in the running process, and the compiling optimization is mainly embodied on hardware resource allocation, so that the hardware resources can be allocated more reasonably, and the performance of a processor is improved.
For example, when the processor runs the optimizing compiler, the hardware resources of the processor may be allocated with reference to the compiling instruction information, and for example, allocating the hardware resources may include at least one of adjusting a resource allocation ratio and controlling a resource occupation occasion. The specific process of generating the compiling instruction information will be described below with respect to the above two allocation methods, respectively.
For example, the performance data includes occupancy of different hardware resources by the processor in performing tasks.
The step S20 may include determining a key degree of each hardware resource to each thread according to occupation conditions of different hardware resources by the processor in a task executing process, and generating compiling instruction information in combination with the key degree of each hardware resource to each thread, where the compiling instruction information is used for adjusting occupation proportion and/or dynamic use range of each thread to each hardware resource according to the key degree of each hardware resource to each thread when running an optimized compiler.
For example, the criticality may indicate a level of demand of the hardware resource for the thread, e.g., a level of importance. For example, as described above, if a certain hardware resource is missing, it will cause the interrupt or the stall of thread 0, and the interrupt or the stall has a relatively large impact on the performance of the processor, and if a certain hardware resource is missing, it will not cause the interrupt or the stall of thread 0, or even if the interrupt or the stall does not cause a too great impact on the performance of the processor, and the critical extent of the hardware resource for thread 0 is relatively low.
For example, criticality may be defined in terms of two states, a "critical" and a "non-critical".
For example, the criticality may be defined in terms of a number of states, e.g., a score may be made that higher scores represent higher criticality, more important to a thread, or higher demands on the hardware resources by a thread. When the degree of the key is defined by a plurality of states, the importance level can be further thinned, and the allocation proportion of the resource can be more reasonably adjusted than when the degree of the key is defined by two states.
For example, for a hardware resource, when some threads consider the hardware resource to be important or critical to themselves, some threads consider the hardware resource to be unimportant or not critical to themselves, and performing resource allocation scaling on the hardware resource can greatly improve the performance of the processor.
For example, in some embodiments, for any one hardware resource, in response to a criticality of any one hardware resource to M threads of the plurality of threads being greater than or equal to a first threshold and a criticality of any one hardware resource to N threads of the plurality of threads being less than or equal to a second threshold, the compilation instruction information is configured to instruct, when the optimization compiler is running, to reduce an allocation ratio of any one hardware resource to N threads, or to limit a dynamic range of use of any one hardware resource by N threads.
For example, for a certain hardware resource, if the criticality of the hardware resource to M threads is greater than or equal to a first threshold, that is, it means that M threads consider that the criticality of the hardware resource to themselves is higher, and that the criticality of the hardware resource to N threads is less than or equal to a second threshold, that is, it means that N threads consider that the criticality of the hardware resource is lower, at this time, when the optimal compiler is running, the allocation proportion of the hardware resource to N threads that consider that the hardware resource is less important to themselves may be reduced, or the dynamic use range of the hardware resource by the N threads may be limited by referencing the compilation instruction information generated in combination with the criticality of each hardware resource to each thread.
Thus, the waste of hardware resources can be reduced, and the allocation proportion of the hardware resources allocated to other threads is increased, for example, the allocation proportion of M threads to the resources is increased, so that the performance of a processor can be improved, and the number of times of stagnation or interruption and the duration of a thread pipeline caused by the lack of the hardware resources are reduced.
For example, the first threshold and the second threshold may be set as needed, for example, when the criticality includes two states, i.e., critical and non-critical, the first threshold may be set as critical and the second threshold as non-critical. For example, when the criticality is measured by a score of 0-100, the first threshold may be set to 70 and the second threshold to 30, which is not particularly limited by the present disclosure.
The key degree is subdivided, for example, the key degree of the threads is measured by adopting a plurality of states, so that the key level can be refined, the key degree comparison among the threads is more accurate, and the rationality of resource allocation is improved.
For example, in one example, the hardware resource is a cache. In this embodiment, the occupancy of different hardware resources by the processor in performing tasks includes the number of times each thread occupies the cache. For example, such occupancy may be a number of multiplexes of cache data in the cache, such as a number of cache hits.
For example, determining the criticality of each hardware resource to each thread according to the occupation condition of the processor to different hardware resources in the task execution process may include determining the criticality of the cache to the thread according to the occupation times of the thread to the cache for each thread, wherein the criticality of the cache to the thread is lower in response to the fewer the occupation times of the thread to the cache, wherein the compiling instruction information is used for indicating to reduce the allocation proportion of the cache to N threads when the optimizing compiler is running or limiting the N threads to select one cache line from the occupied cache lines of the N threads to perform the cache replacement when the cache replacement occurs.
For example, the criticality may be classified into criticality and non-criticality, where the criticality of a thread is determined to be non-criticality when the number of times that a thread occupies a cache is less than a threshold number of times that the thread occupies a cache, and the criticality of the thread is determined to be criticality when the number of times that the thread occupies a cache is greater than or equal to the threshold number of times that the thread occupies the cache.
For example, the criticality may also be subdivided into more states, with lower criticality of the cache to the thread, and lower demand of the thread to the cache, in response to fewer occupancy times of the thread to the cache.
For example, the compiling instruction information may be generated in combination with the criticality of the cache to each thread, e.g., the compiling instruction information may include the criticality of the cache to each thread.
For example, when determining that the cache is less critical to a thread, e.g., below a second threshold or is not critical, the processor may learn, when running the optimizing compiler, that the cache is less critical to the thread with reference to the compilation instruction information, the processor may specify that the thread can only use a portion of the cache, e.g., only use 20% of the cache, or may specify that when a cache replacement occurs for the thread, only one of the cache lines previously applied by the thread is replaced with a new cache line, or may use a combination of both. This ensures that other threads are less affected by the use of the cache by that thread.
In the embodiment, characteristic analysis guidance is carried out on compiling optimization through performance data collected in a program running process, and the allocation of multithreaded resources is adjusted, so that statically allocated hardware resources can be divided according to the characteristics of different threads, dynamically allocated and multithreaded shared hardware resources are controlled according to the use characteristics of different threads, certain threads are prevented from occupying excessive resources, but the performance is improved, the allocation of the hardware resources is more reasonable, and the performance of a processor is improved.
For example, in one example, the hardware resource is a branch target table. Each branch target entry in the branch target table is used to store information about the branch jump.
For example, the occupancy of different hardware resources by a processor in performing tasks includes branch prediction accuracy for each branch in each thread.
For example, determining the criticality of each hardware resource to each thread according to the occupation condition of the processor to different hardware resources in the process of executing tasks can comprise determining the criticality of a branch target table to the threads according to the branch number of which the branch prediction accuracy is lower than an accuracy threshold value in each thread, wherein the more the branch number of which the branch prediction accuracy is lower than the accuracy threshold value in the threads is responded, the lower the criticality of the branch target table to the threads is, wherein compiling indication information is used for indicating to reduce the allocation proportion of the branch target table to N threads when an optimized compiler is run or limiting N threads to select one branch target table item from the branch target table items occupied by N threads to replace when the branch target table items are applied.
For example, the criticality may be divided into criticality and non-criticality, where the criticality of a branch target table to a thread is determined to be non-criticality when the number of branches in a thread for which the accuracy of branch prediction is below the accuracy threshold is greater than or equal to the threshold of the number of branch occupancies, i.e., there are many branches in the thread for which it is difficult to predict accurately, and the criticality of a branch target table to a thread is determined to be criticality when the number of branches in the thread for which the accuracy of branch prediction is below the accuracy threshold is less than the threshold of the number of branch occupancies.
For example, the criticality may also be subdivided into more states, with lower criticality of the branch target table to the thread, the lower the thread's demand for the branch target table, in response to the greater number of branches in the thread having branch prediction accuracy below an accuracy threshold.
For example, the compiling indication information may be generated in combination with the criticality of the branch target table to each thread, for example, the compiling indication information may include the criticality of the branch target table to each thread.
For example, when determining that the criticality of the branch target table to a thread is low, e.g., below a second threshold or the criticality is not critical, the processor may learn, when running the optimizing compiler, that the branch target table is low in criticality to the thread with reference to the compilation instruction information, the processor may specify that the thread can use only a portion of the capacity in the branch target table, e.g., can use only 20% of the branch target entries in the branch target table, or may specify that when the thread applies for a branch target entry, only one of the branch target entries previously applied for by the thread can be replaced, or may combine the two. This ensures that other threads are less affected by the branch target table when it is used by the thread.
In the embodiment, characteristic analysis guidance is carried out on compiling optimization through performance data collected in a program running process, and the allocation of multithreaded resources is adjusted, so that statically allocated hardware resources can be divided according to the characteristics of different threads, dynamically allocated and multithreaded shared hardware resources are controlled according to the use characteristics of different threads, certain threads are prevented from occupying excessive resources, but the performance is improved, the allocation of the hardware resources is more reasonable, and the performance of a processor is improved.
For example, in some embodiments, the occupation of different hardware resources by the processor during execution of the task may further include the total amount of occupation of the respective hardware resources by each thread during a preset period of time for executing the task. For example, the preset period of executing a task may refer to a compiler corresponding to a period of time in which the task is executed, or a partial period of time within the entire period of time in which the compiler is executed.
Determining the key degree of each hardware resource to each thread according to the occupation condition of the processor to different hardware resources in the task execution process can comprise determining the key degree of each hardware resource to each thread according to the occupation total amount of a plurality of threads to each hardware resource.
For example, in some embodiments, determining the criticality of each hardware resource to each thread based on the total amount of occupancy of each hardware resource by the plurality of threads may include determining, for a first hardware resource of the different hardware resources, that the criticality of the first hardware resource to the thread is not critical in response to the total amount of occupancy of the first hardware resource by the thread being less than B/B for each thread, wherein the first hardware resource is a hardware resource that is exclusively used by each of the plurality of threads, B represents the total amount of resources of the first hardware resource, B is a total number of the plurality of threads, B and B are integers greater than 1, wherein, in response to the criticality of the first hardware resource to at least one of the plurality of threads being not critical, compiling the indication information is used to indicate, for each thread of the at least one thread, that the first hardware resource is allocated in accordance with the total amount of occupancy of the first hardware resource by each thread when running the optimized compiler.
For example, the first hardware resource may be a hardware resource allocated in a static allocation mode among the hardware resources. For example, it may be predetermined that each thread equally divides the first hardware resource, and each thread exclusively uses its own allocated portion, without occupying the first hardware resources of other threads.
For example, if a thread has been running for a period of time and then the total amount of first hardware resources B' that it uses for a substantial portion of the time (e.g., 95% and above of the entire run period) is less than an aliquot of first hardware resources (B/B), then it is indicated that the first hardware resources are not critical to the thread.
For example, the compiling indication information may be generated in combination with the criticality of the first hardware resource to each thread, for example, the compiling indication information may include the criticality of the first hardware resource to each thread.
For example, when determining that the first hardware resource is less critical to a thread, e.g., the critical degree is not critical, when running the optimizing compiler, the processor may learn that the first hardware resource is less critical to the thread with reference to the compilation instruction information, and the processor may specify that only the occupied total amount b' of the first hardware resource used by the thread is allocated to the thread with reference to the compilation instruction information.
For example, the total amount b' of the first hardware resource used by the thread may also be included in the compiling instruction information, and the compiling instruction information is provided to the processor for allocation reference.
In the embodiment, characteristic analysis guidance is carried out on compiling optimization through performance data collected in a program running process, and the allocation of multithreaded resources is adjusted, so that statically allocated hardware resources can be divided according to the characteristics of different threads, dynamically allocated and multithreaded shared hardware resources are controlled according to the use characteristics of different threads, certain threads are prevented from occupying excessive resources, but the performance is improved, the allocation of the hardware resources is more reasonable, and the performance of a processor is improved.
For example, in other embodiments, determining the criticality of each thread by each hardware resource according to the total amount of occupancy of each hardware resource by a plurality of threads may include determining, for any hardware resource, a sum of the total amounts of occupancy of any hardware resource by a plurality of threads according to the total amount of occupancy of any hardware resource by each thread, determining a total length of pipeline stalling for each thread in the absence of any hardware resource in response to the sum of the total amounts of occupancy being greater than the total amount of resources of any hardware resource, determining the criticality of any hardware resource by each thread according to the total length of pipeline stalling for each thread, wherein the total length of pipeline stalling is in positive correlation with the criticality, and wherein the compilation instruction information is configured to instruct, when running the optimal compiler, to increase the allocation ratio of a thread with a higher criticality to any hardware resource and decrease the allocation ratio of a thread with a lower criticality to any hardware resource.
For example, for a certain hardware resource a, it is found that the total amount of occupation of the certain thread (for example, thread 0) to the hardware resource a in the operation period is 90%, and the total amount of occupation of the certain thread (for example, thread 1) to the hardware resource a in the operation period is 20%, that is, the sum of the total amounts of occupation of the two threads to the hardware resource a is 110% of the hardware resource a, which is greater than the total amount of resources of the hardware resource a, that is, the two threads may have pipeline stalls caused by missing hardware resource a.
At this time, the total time t1 of pipeline stalling of the thread 0 in the absence of the hardware resource a and the total time t2 of pipeline stalling of the thread 1 in the absence of the hardware resource a may be determined, and then, the criticality of the hardware resource a to each thread may be determined according to t1 and t 2. For example, the longer the total length of pipeline stall t1 for thread 0 in the absence of hardware resource a, the higher the criticality of hardware resource a to thread 0, and the shorter the total length of pipeline stall t1 for thread 0 in the absence of hardware resource a, the lower the criticality of hardware resource a to thread 0.
For example, if t2 is greater than t1, then hardware resource A is considered to be more critical to thread 1 than hardware resource A is to thread 0.
For example, the compiling instruction information may be generated in combination with the criticality of the hardware resource a to each thread, for example, the compiling instruction information may include the criticality of the hardware resource a to each thread.
For example, when the optimizing compiler is running, the processor may know the criticality of the hardware resource a to two threads with reference to the compiling instruction information, and the processor may specify that when the optimizing compiler is running, the processor may increase the allocation ratio of the thread with a higher criticality (for example, the thread 1) to the hardware resource a, decrease the allocation ratio of the thread with a lower criticality (for example, the thread 0) to the hardware resource a, for example, may allocate 80% of the hardware resource a to the thread 0, and allocate 20% of the hardware resource a to the thread 1. Thus, pipeline stall of thread 1 can be avoided, and processor performance can be improved.
For example, the compiling instruction information may further include an allocation ratio of the hardware resource a to the two threads, and the compiling instruction information is provided for the processor to make allocation references.
In the embodiment, characteristic analysis guidance is carried out on compiling optimization through performance data collected in a program running process, and the allocation of multithreaded resources is adjusted, so that statically allocated hardware resources can be divided according to the characteristics of different threads, dynamically allocated and multithreaded shared hardware resources are controlled according to the use characteristics of different threads, certain threads are prevented from occupying excessive resources, but the performance is improved, the allocation of the hardware resources is more reasonable, and the performance of a processor is improved.
For example, in some embodiments, the allocation rate of a hardware resource to a thread that is more critical to the hardware resource may be increased.
For any one of the hardware resources, in response to the criticality of any one of the hardware resources to M threads of the plurality of threads being greater than a first threshold and the criticality of any one of the hardware resources to N threads of the plurality of threads being less than a second threshold, i.e., the hardware resources being critical to some threads and less critical to other threads, the compilation instruction information is used to instruct, when the optimization compiler is running, to increase the allocation proportion of the hardware resources to the M threads deemed critical to the hardware resources.
For example, determining the criticality of each hardware resource to each thread according to the occupation condition of the processor to different hardware resources in the task execution process comprises determining the total time length of pipeline stagnation or the proportion of the stagnation time of the thread in the running total time length caused by the lack of any hardware resource according to the time length of pipeline interruption or stagnation of each thread when any hardware resource is lack and the times of the thread lack of any hardware resource, and determining the criticality of the thread to any hardware resource according to the total time length or the proportion of the stagnation time in the running total time length.
For example, for hardware resource B, the total length of pipeline stall or the fraction of stall time in the running total length of time for each thread caused by the lack of hardware resource B is determined based on the length of pipeline interrupt or stall for each thread in the absence of hardware resource B and the number of times each thread lacks hardware resource B. If the total length of the dead time is longer or the occupation of the dead time in the total length of the operation is larger, determining that the key degree of the thread to the hardware resource B is higher, otherwise, if the total length of the dead time is shorter or the occupation of the dead time in the total length of the operation is smaller, determining that the key degree of the thread to the hardware resource B is lower.
For example, for thread 0, if the total length of the stall is greater than a preset threshold or the duty ratio of the total length of the stall to the running time is greater than a preset threshold, determining that the criticality of thread 0 to hardware resource B is critical, otherwise, determining that the criticality of thread 0 to hardware resource B is not critical. Thus, the criticality of the hardware resource B for each thread can be determined.
For example, the compiling instruction information may be generated in combination with the criticality of the hardware resource B to each thread, and for example, the compiling instruction information may include the criticality of the hardware resource B to each thread.
For example, the processor may learn, with reference to the compilation instruction information, that the criticality of the hardware resource B for some threads is greater than a first threshold, and that the criticality of the hardware resource B for some threads is less than a second threshold, i.e., that the hardware resource B is critical for some threads and not critical for other threads, and when running the optimization compiler, the processor may specify, with reference to the compilation instruction information, to increase the allocation ratio of the hardware resource B to M threads having a criticality greater than the first threshold.
In the embodiment, characteristic analysis guidance is carried out on compiling optimization through performance data collected in a program running process, and the allocation of multithreaded resources is adjusted, so that statically allocated hardware resources can be divided according to the characteristics of different threads, dynamically allocated and multithreaded shared hardware resources are controlled according to the use characteristics of different threads, certain threads are prevented from occupying excessive resources, but the performance is improved, the allocation of the hardware resources is more reasonable, and the performance of a processor is improved.
For example, in some embodiments, the allocation ratio of a hardware resource to N threads may be reduced, and the phase change increases the allocation ratio of the hardware resource to other threads. For example, in other embodiments, the allocation ratio of the hardware resource to M threads may be increased, and the allocation ratio of the hardware resource to other threads may be decreased. For example, in other embodiments, the allocation ratio of the hardware resource to N threads may be reduced while the allocation ratio of the hardware resource to M threads is increased, with the remaining hardware resource being used for threads other than the M+N threads.
For example, the occupation of the different hardware resources by the processor in the process of running the task includes any combination of parameters including the occupation times of each thread to each hardware resource, the stagnation times of pipelines when each thread lacks each hardware resource, the stagnation time of pipelines when each thread lacks each hardware resource, the occupation total amount of each thread to each hardware resource, and the branch prediction accuracy of each branch in each thread.
The selection of parameters can be done by a person skilled in the art in connection with hardware resources that need to be adjusted. For example, the key degree of each thread to the cache may be determined according to the number of times the thread occupies the cache, and the allocation proportion of the cache may be adjusted or the use range of each thread may be controlled. And, the occupation condition of the different hardware resources by the processor during the task running process may further include other parameters, which are not limited to the above parameters, and may be selected and set by those skilled in the art according to the needs, which is not specifically limited in this disclosure.
For example, when determining the criticality of each hardware resource to each thread according to the occupation condition of the processor to different hardware resources in the task execution process, one or more parameters may be used for determining the criticality of any hardware resource, for example, the collected performance data for the hardware resource may be compared with a preset threshold, or the criticality of the hardware resource to each thread may be obtained by integrating a plurality of parameters, which is not particularly limited in the disclosure.
As mentioned above, the allocation adjustment of the hardware resources of the processor can also control the occupation time of each thread occupying the hardware resources.
For example, the performance data includes a temporal distribution of the various hardware resources occupied by the multiple threads when the processor is executing the task.
For example, the step S20 may include generating compiling instruction information in combination with a time distribution of occupation of each hardware resource by a plurality of threads when the processor executes the task, where the compiling instruction information is used to control occupation time of each thread occupying each hardware resource according to the time distribution of occupation of each hardware resource by the plurality of threads when the optimizing compiler is running.
For example, the compiling indication information may include a time when the plurality of threads occupy each hardware resource, for example, a time point when a certain thread that needs to control the time when the time is occupied occupies a certain hardware resource. For example, when the optimizing compiler is running, the processor may refer to the time of the thread which is known by the compiling instruction information to occupy a certain hardware resource, and reserve the target hardware resource for the thread when the time of using the target hardware resource by the thread is close.
For example, the compilation instruction may include a temporal distribution of the individual hardware resources occupied by the multiple threads. For example, when the optimizing compiler is running, the processor may refer to the time distribution of the thread occupying a certain hardware resource obtained by the compiling instruction information, analyze the time when the thread occupies the hardware resource, and reserve the target hardware resource for the thread when the time when the thread uses the target hardware resource is close to the time when the thread uses the target hardware resource.
For example, taking thread 0 as an example, collecting performance data to obtain a time distribution of occupation of hardware resources C by thread 0, for example, the time distribution of occupation of hardware resources C by thread 0 may include that thread 0 does not need hardware resources C in 0-100 seconds, and that hardware resources C are needed in 100-200 seconds, then it may begin in approximately 100 seconds, for example, 90 th seconds, and the processor reserves hardware resources C for thread 0 in advance, for example, the released hardware resources C are not reassigned to other threads, reserved for thread 0, and so on. Therefore, when the thread 0 needs to use the hardware resource C, the required hardware resource C can be directly obtained, the pipeline of the thread 0 is not stagnated due to the lack of the hardware resource C, and the performance of a processor is improved.
For example, in some embodiments, the performance data includes dependencies between hardware resources used by each thread by the processor in performing tasks.
For example, the dependency between the hardware resources used by each thread is obtained by the occupation of multiple hardware resources at the time of pipeline stall of each thread. For example, the plurality of hardware resources includes hardware resources other than those that directly result in pipeline stalls of the threads in the different hardware resources.
For example, step S20 may include generating compiling instruction information in combination with the dependency relationship between the hardware resources used by each thread, where the compiling instruction information is used to control the occupation time or the resource occupation priority of the hardware resources used by the thread according to the dependency relationship between the hardware resources used by each thread when the optimizing compiler is running.
For example, in response to a dependency relationship between P hardware resources used by a thread to perform a first operation, the P hardware resources include a target hardware resource, and P is a positive integer greater than 1. At this time, when the optimizing compiler is running, in combination with the compiling instruction information generated according to the dependency relationship between the P hardware resources used by the thread, the processor may allocate the occupation right of the target hardware resource to the thread after the remaining P-1 hardware resources except the target hardware resource return the usage data required for executing the first operation, or increase the occupation priority of the thread on the target hardware resource.
For example, the target hardware resource needs to wait for the other P-1 hardware resources to return the usage data needed to perform the first operation before being used, or the usage of the target hardware resource needs the usage data returned by the other P-1 hardware resources.
For example, the compilation instruction information may include dependencies between hardware resources used by each thread, such as dependencies between hardware resources used by a thread that needs to control the timing. For example, when the optimizing compiler is running, the processor may refer to the dependency relationship between P hardware resources used by the thread and obtained by the compiling instruction information, analyze the time when the thread occupies the hardware resources, and after obtaining the usage data returned by the remaining P-1 hardware resources except the target hardware resources, the processor may allocate the occupancy right of the target hardware resources to the thread, or increase the occupancy priority of the thread to the target hardware resources.
For example, the compilation instruction may include a thread's occupancy of the target hardware resource, e.g., the compilation instruction contains suggestion information describing that the target hardware resource may be re-occupied when all of the remaining P-1 hardware resources of the thread, except the target hardware resource, return usage data. For example, when the optimizing compiler is running, the processor may refer to the above suggestion information of the thread occupying a certain hardware resource obtained by the compiling instruction information, and after obtaining the usage data returned by the remaining P-1 hardware resources except the target hardware resource, allocate the occupying right of the target hardware resource to the thread, or increase the occupying priority of the thread to the target hardware resource.
For example, taking the example of the thread 0, when the compiler is running, the pipeline of the thread 0 is stalled or interrupted due to the lack of the hardware resource D, and the occupation condition of other hardware resources except the hardware resource D by the thread 0 is collected when the thread 0 is stalled, so as to determine that the other hardware resources occupied by the front of the thread 0 cannot be executed by the pipeline due to the lack of the hardware resource D, for example, it is determined that after the hardware resource D is obtained, the thread 0 cannot be executed by the pipeline due to the lack of the hardware resource E. Thus, for thread 0, it may be determined that there is a dependency between two hardware resources (i.e., hardware resource D and hardware resource E) used by it to perform the first operation, and the compilation instruction information may be generated in conjunction with the dependency, so that when the optimization compiler is running, the processor may consider that the hardware resource E is not occupied first, and wait until the hardware resource D returns usage data for performing the first operation and then occupies the hardware resource E. Specifically, the occupation right of the hardware resource E may be allocated to the thread 0 at this time, or the occupation priority of the thread 0 on the hardware resource E may be increased.
For example, in one particular example, thread 0 needs to wait for a value returned from memory to be executable even if it contends for a dynamically shared address compute unit queue, and before the memory returns a value, thread 0 occupies a large number of address compute unit queues, which may cause another thread to stall due to the lack of an address compute unit queue, without any increase in processor performance. At this time, the thread 0 may not occupy the address calculation unit queue first, and then allocate the address calculation unit queue to the thread 0 after waiting for the memory to return the value, or increase the occupancy priority of the thread 0 to the address calculation unit, for example, the thread 0 may use the address calculation unit queue preferentially.
In the embodiment, the occupation time of each thread to the hardware resources can be controlled, the allocation of the hardware resources is reasonably adjusted, the situation that some threads occupy excessive hardware resources but do not help to improve the performance is avoided, the pipeline of other threads is stopped instead, and the performance of a processor is improved.
In practice, the performance data may be set as needed to adjust the resource allocation ratio between threads, or control the occupation time of each thread occupying the hardware resource, or adjust and control the hardware allocation ratio and occupation time of the hardware resource, which is not specifically limited in the present disclosure. Specific methods for adjusting the allocation ratio and controlling the occupied timing can be referred to the description of the above embodiments.
In at least one embodiment of the present disclosure, performance data obtained by a task's compiler running for a period of time may be collected, the compiler is a general rule-based compiled program, based on occupation conditions of a processor in the performance data on different hardware resources during running the task, a degree of criticality of each hardware to a thread may be determined, and based on the degree of criticality, compiling instruction information may be generated, where the compiling instruction information may be used to provide a reference when running an optimized compiler. For example, the processor may divide hardware resources based on the criticality, for example, more hardware resources may be allocated to threads with higher criticality, less hardware resources may be allocated to threads with lower criticality, or the range of use of hardware resources by threads with lower criticality may be limited, so that hardware resources may be allocated reasonably, and hardware resources may be allocated pertinently according to program characteristics, thereby improving the performance of the processor to the maximum extent.
In addition, in at least one embodiment of the present disclosure, based on the time distribution of the multiple threads occupying each hardware resource or the dependency relationship between the hardware resources used by each thread in the process of executing the task by the processor in the performance data, the compiling instruction information is generated, and the compiling instruction information may be used to provide a reference when running the optimizing compiler. For example, the processor may control the occupation time of the hardware resources according to the compiling indication information, for example, the hardware resources may be reserved in advance at a time close to the time when the threads need the hardware resources, or for a plurality of hardware resources with a dependency relationship, the hardware resources may be occupied after the usage data needed by the execution operation are prepared to be completed, so that the dynamically allocated shared resources may be simultaneously controlled according to the usage characteristics of different threads, so that some threads are prevented from occupying too many resources but not contributing to the performance improvement, the hardware resources are reasonably allocated, and the hardware resources are allocated pertinently according to the program characteristics, thereby improving the performance of the processor to the maximum extent.
At least one embodiment of the present disclosure provides a resource allocation method. Fig. 3 is a schematic flow chart of a resource allocation method provided in at least one embodiment of the present disclosure.
For example, as shown in fig. 3, the data processing method provided in the embodiment of the present disclosure at least includes step S40.
For example, the resource allocation method is applied to a processor, such as a processor using SMT technology. For example, the processor includes at least one processing core that is a physical core, and multiple threads in parallel in one processing core perform tasks, the multiple threads being multiple logical cores that can be understood to be virtual in one physical core.
For example, the processor may be implemented as a GPU (graphics processor), a CPU (central processing unit), an NPU (neural network processor), a DSP (digital signal processor), or the like according to actual needs, which is not limited by the present disclosure.
In step S40, hardware resources of the processor used when the processor performs the task are allocated according to the compiling instruction information.
The compiling instruction information is generated according to performance data generated by the processor in the process of executing a compiler corresponding to the task, the processor executes the task by using a plurality of threads at the same time, and the performance data indicates the requirement degree and/or the requirement time of the plurality of threads on different hardware resources in the processor.
Regarding the process of collecting performance data and the process of generating compiling instruction information based on the performance data, reference may be made to the related description in the above information processing method, and the repetition is not repeated.
For example, in some embodiments, step S40 may include determining a criticality of each hardware resource to each thread according to the compiling indication information, and adjusting an occupancy rate and/or a dynamic range of use of each hardware resource by each thread according to the criticality of each hardware resource to each thread.
For example, the compilation instruction may directly include how critical each hardware resource is to each thread.
For example, the compilation instruction may include performance data that is analyzed to determine how critical each hardware resource is to each thread. The key degree of each hardware resource for each thread obtained by analyzing the performance data can refer to the related content in the information processing method, and the repetition is not repeated.
For example, adjusting the occupation proportion and/or dynamic use range of each thread to each hardware resource according to the criticality of each hardware resource to each thread may include, for any hardware resource, responding to that the criticality of any hardware resource to M threads in the plurality of threads is greater than or equal to a first threshold value and the criticality of any hardware resource to N threads in the plurality of threads is less than or equal to a second threshold value, reducing the allocation proportion of any hardware resource to N threads, or limiting the dynamic use range of N threads to any hardware resource, where M and N are positive integers.
For example, for a certain hardware resource, if the criticality of the hardware resource to M threads is greater than or equal to a first threshold, that is, it means that M threads consider that the criticality of the hardware resource to itself is higher, and the criticality of the hardware resource to N threads is less than or equal to a second threshold, that is, it means that N threads consider that the criticality of the hardware resource is lower, at this time, when the optimal compiler is running, the processor may refer to the criticality of each hardware resource to each thread, reduce the allocation proportion of the hardware resource to N threads that consider that the hardware resource is less important to itself, or limit the dynamic use range of the N threads to the hardware resource. Thus, the waste of hardware resources can be reduced, and the allocation proportion of the hardware resources allocated to other threads is increased, for example, the allocation proportion of M threads to the resources is increased, so that the performance of a processor can be improved, and the number of times of stagnation or interruption and the duration of a thread pipeline caused by the lack of the hardware resources are reduced.
For example, in one example, the hardware resource is a cache. In this embodiment, adjusting the occupation proportion and/or dynamic use range of each thread to each hardware resource according to the criticality of each hardware resource to each thread may include reducing the allocation proportion of the cache to N threads, or limiting the N threads to select a cache line from the cache lines occupied by the N threads to perform cache replacement when the cache replacement occurs.
For example, a task performed by a thread may be a streaming application, where large amounts of streaming data are often used only once and not used any more in the future. For example, when running an optimizing compiler, the processor may learn that the cache is less critical to the thread with reference to the compilation instruction information, the processor may specify that the thread can only use a portion of the cache, such as only 20% of the cache, or may specify that when a cache replacement occurs for the thread, only one of the cache lines previously applied by the thread can be replaced with a new cache line, or may combine the two. This ensures that other threads are less affected by the use of the cache by that thread.
For example, in one example, the hardware resource is a branch target table. Each branch target entry in the branch target table is used to store information about the branch jump.
According to the key degree of each hardware resource to each thread, adjusting the occupation proportion and/or dynamic use range of each thread to each hardware resource can comprise reducing the allocation proportion of a branch target table to N threads or limiting N threads to select one branch target table item from the N occupied branch target table items to replace when the branch target table items are applied.
For example, there are many branches in a thread that are difficult to predict accurately. For example, when running an optimization compiler, the processor may refer to the compilation instruction information to learn that the branch target table is less critical to the thread, the processor may refer to the compilation instruction information to specify that the thread can only use a portion of the capacity in the branch target table, e.g., can only use 20% of the branch target entries in the branch target table, or can specify that when the thread applies for a branch target entry, can only select one of the branch target entries previously applied for by the thread for replacement, or can use both. This ensures that other threads are less affected by the branch target table when it is used by the thread.
For example, in one example, the hardware resource is a first hardware resource, which may be a hardware resource allocated in a static allocation mode among the hardware resources, and the first hardware resource is a hardware resource exclusively used by each of the plurality of threads.
The adjusting of the occupation proportion and/or the dynamic use range of each thread to each hardware resource according to the criticality of each hardware resource to each thread may include, in response to the criticality of the first hardware resource to at least one thread of the plurality of threads being non-critical, allocating, for each thread of the at least one thread, the first hardware resource according to the total occupation amount of each thread to the first hardware resource, wherein the total occupation amount of each thread to the first hardware resource is determined in the process of collecting performance data.
For example, if a thread has been running for a period of time and then the total amount of first hardware resources B' that it uses for a substantial portion of the time (e.g., 95% and above of the entire run period) is less than an aliquot of first hardware resources (B/B), then it is indicated that the first hardware resources are not critical to the thread.
For the thread, the processor may only allocate the occupied total amount b' of the first hardware resources used by the thread to the thread.
For example, in some embodiments, for any one hardware resource, the sum of the total amount of occupancy of multiple threads using the hardware resource is greater than the total amount of resources of the hardware resource, i.e., the threads always have one thread stalled due to the lack of the hardware resource when using the hardware resource. At the moment, according to the key degree of each hardware resource to each thread, the occupation proportion and/or dynamic use range of each thread to each hardware resource are adjusted, wherein the adjustment comprises the steps of improving the allocation proportion of the thread with higher key degree to any hardware resource and reducing the allocation proportion of the thread with lower key degree to any hardware resource.
For example, the criticality may be determined based on the total length of pipeline stalls for each thread in the absence of the hardware resource, or the criticality may be determined based on the ratio of the total length of pipeline stalls for each thread in the absence of the hardware resource over the entire run time. For example, the longer the total length of the stalls or the higher the total length of the stalls to the ratio, which means that the pipeline stalls of this thread have a greater impact on performance, the allocation ratio of the thread to the hardware resource may be increased, while the allocation ratio of other threads to the hardware resource may be decreased. For example, if multiple threads occupy the hardware resource, the threads with higher criticality may be allocated a larger share of the hardware resource according to the ranking from top to bottom of criticality, and the threads with the lowest criticality may be allocated a minimum share of the hardware resource.
For example, in some embodiments, adjusting the occupation proportion and/or dynamic use range of each thread to each hardware resource according to the criticality of each hardware resource to each thread may include, for any one hardware resource, increasing the allocation proportion of any one hardware resource to M threads in response to the criticality of any one hardware resource to M threads in the plurality of threads being greater than or equal to a first threshold and the criticality of any one hardware resource to N threads in the plurality of threads being less than or equal to a second threshold.
For example, the criticality may be determined based on the total length of pipeline stalls for each thread in the absence of the hardware resource, or the criticality may be determined based on the ratio of the total length of pipeline stalls for each thread in the absence of the hardware resource over the entire run time. For example, the allocation rate of hardware resources to the M threads that consider the hardware resources more critical to themselves may be increased at this time.
For example, in some embodiments, allocating hardware resources of a processor for use in performing tasks by the processor according to the compilation instruction information may include controlling a utilization opportunity or a resource utilization priority of each thread to occupy respective hardware resources according to the compilation instruction information.
For example, the compiling indication information may include a time when the plurality of threads occupy each hardware resource, for example, a time when a certain thread needs to control the time when the thread occupies a certain hardware resource. For example, when the optimizing compiler is running, the processor may refer to the time of the thread that the compiling instruction information knows to occupy a certain hardware resource, reserve the target hardware resource for the thread when the time of using the target hardware resource by the thread is close, or occupy the hardware resource (for example, computing resource) when all data required for executing the operation by the thread is in place.
For example, the compilation instruction may include a temporal distribution of the individual hardware resources occupied by the multiple threads. For example, when the optimizing compiler is running, the processor may refer to the time distribution of the thread occupying a certain hardware resource obtained by the compiling instruction information, analyze the time when the thread occupies the hardware resource, and reserve the target hardware resource for the thread when the time when the thread uses the target hardware resource is close to the time when the thread uses the target hardware resource.
For example, the compilation instruction information may include dependencies between multiple hardware resources used by threads. For example, when running an optimization compiler, the processor may analyze the dependency between a plurality of hardware resources used by threads known by the compilation instruction information and may specify that hardware resources (e.g., computing resources) are to be re-occupied when all data needed for the execution of an operation by the thread is in place.
For example, in response to a dependency relationship between P hardware resources used by a thread to perform a first operation, the P hardware resources include a target hardware resource, and P is a positive integer greater than 1. For example, controlling the occupation time or the resource occupation priority of each thread occupying each hardware resource according to the compiling indication information may include allocating the occupation weight of the target hardware resource to the thread or increasing the occupation priority of the thread to the target hardware resource when obtaining the usage data of the first operation returned by the remaining P-1 hardware resources except the target hardware resource according to the compiling indication information.
For example, in other embodiments, other resource allocation policies may be used for resource partitioning and occupancy timing control, including but not limited to the resource allocation policies described above, and the compiled indication information may be used to provide a reference for a variety of different hardware resource allocation policies.
In at least one embodiment of the present disclosure, the resource allocation method may provide a reference for the resource allocation of the processor running the optimization compiler according to the compilation instruction information. For example, the compilation instruction information may be generated by collecting performance data obtained by a compiler running a task for a period of time.
For example, the processor may divide the hardware resources according to the compiling instruction information, allocate more hardware resources to the threads with higher criticality, allocate less hardware resources to the threads with lower criticality, or limit the application range of the threads with lower criticality to the hardware resources, so as to reasonably allocate the hardware resources, allocate the hardware resources pertinently according to the program characteristics, and improve the performance of the processor to the maximum extent.
For example, the processor may control the occupation time of the hardware resources according to the compiling indication information, for example, the hardware resources may be reserved in advance at a time close to the time when the hardware resources are needed by the threads, or for a plurality of hardware resources with a dependency relationship, the hardware resources may be occupied after the usage data needed by the execution operation is ready to be completed, so that the dynamically allocated and shared resources may be simultaneously controlled according to the usage characteristics of different threads, so that some threads are prevented from occupying too many resources but not helping to improve the performance, the hardware resources are reasonably allocated, and the hardware resources are allocated pertinently according to the program characteristics, thereby improving the performance of the processor to the maximum extent.
Corresponding to the above information processing method, at least one embodiment of the present disclosure further provides an information processing apparatus, and fig. 4 is a schematic block diagram of an information processing apparatus provided by at least one embodiment of the present disclosure.
For example, as shown in FIG. 4, the information processing apparatus 100 includes at least a collection module 101, a generation module 102, and an insertion module 103.
A collection module 101 configured to collect performance data generated by the processor during execution of the tasks.
For example, the processor runs multiple threads simultaneously to perform tasks, and the performance data indicates how much and/or when each of the multiple threads is demanding different hardware resources in the processor.
The generating module 102 is configured to generate compiling instruction information corresponding to the task according to the performance data.
The inserting module 103 is configured to insert compiling instruction information into the compiling program of the task to obtain an optimized compiling program corresponding to the task, where the compiling instruction information is used for distributing hardware resources of the processor when the processor runs the optimized compiling program.
For example, the performance data includes occupancy of different hardware resources by the processor in performing tasks.
For example, as shown in fig. 4, the generation module 102 includes a first determination unit 1021 and a first generation unit 1022.
The first determining unit 1021 is configured to determine how critical each hardware resource is to each thread according to the occupation of different hardware resources by the processor during the task execution.
The first generating unit 1022 is configured to generate compiling instruction information in combination with the criticality of each hardware resource to each thread, where the compiling instruction information is used to adjust the occupation proportion and/or the dynamic use range of each thread to each hardware resource according to the criticality of each hardware resource to each thread when the optimizing compiler is running.
For example, for any one hardware resource, in response to the criticality of any one hardware resource to M threads of the plurality of threads being greater than or equal to a first threshold and the criticality of any one hardware resource to N threads of the plurality of threads being less than or equal to a second threshold, the compilation instruction information is used to instruct to reduce the allocation proportion of any one hardware resource to N threads or limit the dynamic use range of N threads to any one hardware resource when the optimization compiler is running, where M and N are positive integers.
For example, any one hardware resource includes a cache, and the occupation of a different hardware resource by a processor during execution of a task includes the number of times each thread occupies the cache. For example, the first determining unit 1021 determines how critical each hardware resource is to each thread according to the occupation situation of the processor to different hardware resources in the task execution process, and includes determining how critical each thread is to the cache according to the occupation times of the thread to the cache, wherein the how critical the cache is to the thread is to be lower in response to the fewer the occupation times of the thread to the cache, and wherein the compiling instruction information is used for indicating to reduce the allocation proportion of the cache to the N threads when the optimizing compiler is running or limiting the N threads to select one cache line from the N occupied cache lines to perform the cache replacement when the cache replacement occurs.
For example, any one of the hardware resources includes a branch target table, each branch target table entry in the branch target table is used to store information about a branch jump, and the occupation of the different hardware resources by the processor during execution of the task includes branch prediction accuracy of each branch in each thread. For example, the first determining unit 1021 determines how critical each hardware resource is to each thread according to the occupation situation of the processor on different hardware resources in the process of executing tasks, and includes determining how critical a branch target table is to the threads according to the number of branches in the threads, the branch target table being lower in the process of executing tasks, wherein the more the number of branches in the threads, the lower the branch target table is in response to the branch prediction accuracy being lower than the accuracy threshold, and wherein compiling indication information is used for indicating to reduce the allocation proportion of the branch target table to N threads when running an optimized compiler, or limiting N threads to select one branch target table item to replace from the N branch target table items occupied when applying for the branch target table items.
For example, the occupation of different hardware resources by the processor during the task execution process includes the total occupation amount of each thread to each hardware resource in the preset period of task execution. For example, the first determining unit 1021 determines the criticality of each hardware resource to each thread according to the occupation condition of the processor to different hardware resources in the process of executing the task, including determining the criticality of each hardware resource to each thread according to the total occupation amount of each hardware resource by a plurality of threads.
For example, the first determining unit 1021 performs determining how critical each of the plurality of threads is to each of the plurality of threads based on the total amount of occupancy of each of the plurality of hardware resources, including performing operations for determining, for each of the plurality of threads, that for a first hardware resource of the plurality of threads, the first hardware resource is not critical in response to the total amount of occupancy of the first hardware resource by the thread being less than B/B, wherein the first hardware resource is a hardware resource that is exclusively used by each of the plurality of threads, B represents the total amount of resources of the first hardware resource, B is an integer greater than 1, and B are integers greater than 1, wherein, in response to the first hardware resource being not critical in at least one of the plurality of threads, the compiling instruction information is used to instruct, for each of the at least one thread, to allocate the first hardware resource in accordance with the total amount of occupancy of the first hardware resource by each of the thread when running the optimizing compiler.
For example, the first determining unit 1021 determines a criticality of each hardware resource to each thread according to a total amount of occupation of each hardware resource by a plurality of threads, and includes determining a sum of total amounts of occupation of each hardware resource by a plurality of threads according to the total amount of occupation of each hardware resource by each thread, determining a total length of pipeline stalling for each thread when any hardware resource is absent in response to the sum of total amounts of occupation being greater than the total amount of resources of any hardware resource, determining a criticality of pipeline stalling for each thread according to the total length of pipeline stalling for each thread, wherein the total length of pipeline stalling is in positive correlation with the criticality, and compiling instruction information for instructing, when executing an optimal compiler, to increase an allocation ratio of threads with higher criticality to any hardware resource and decrease an allocation ratio of threads with lower criticality to any hardware resource.
For example, for any one of the hardware resources, in response to the criticality of any one of the hardware resources to M threads of the plurality of threads being greater than a first threshold and the criticality of any one of the hardware resources to N threads of the plurality of threads being less than a second threshold, the compilation instruction information is for instructing to increase the allocation ratio of any one of the hardware resources to the M threads when the optimization compiler is running.
For example, the first determining unit 1021 determines how critical each hardware resource is to each thread according to the occupation condition of the processor to different hardware resources in the process of executing tasks, and includes determining the total length of pipeline stalling or the ratio of the dead time of the thread in the running total length of time caused by the lack of any hardware resource according to the length of time of pipeline interruption or the dead time of each thread in the absence of any hardware resource and the number of times of the thread in the absence of any hardware resource, and determining the how critical each thread is to any hardware resource according to the total length of time or the ratio of the dead time in the running total length of time.
For example, the occupancy of different hardware resources by a processor during execution of a task includes any combination of the number of occupancy of the respective hardware resources by each thread, the number of pipeline stalls in the absence of the respective hardware resources by each thread, the length of pipeline stalls in the absence of the respective hardware resources by each thread, the total amount of occupancy of the respective hardware resources by each thread, and the branch prediction accuracy of the respective branches in each thread.
For example, the performance data includes a temporal distribution of the various hardware resources occupied by the multiple threads when the processor is executing the task.
For example, as shown in fig. 4, the generation module 102 may further include a second generation unit 1023.
The second generating unit 1023 is configured to generate compiling instruction information in combination with time distribution of each hardware resource occupied by a plurality of threads when the processor executes the task, where the compiling instruction information is used to control the occupation time of each thread occupying each hardware resource according to the time distribution of each hardware resource occupied by the plurality of threads when the optimizing compiler is running.
For example, the compiling instruction information is used for reserving the target hardware resources for the threads when approaching the use time of the threads for the target hardware resources according to the time distribution of the occupation of the hardware resources by the threads when the optimizing compiler is operated.
For example, the performance data includes dependencies between hardware resources used by each thread by the processor in performing tasks.
For example, as shown in fig. 4, the generation module 102 may further include a third generation unit 1024.
For example, the third generating unit 1024 is configured to generate compiling instruction information in combination with the dependency relationship between the hardware resources used by each thread, where the compiling instruction information is used to control the occupation time or the resource occupation priority of the hardware resources used by the thread according to the dependency relationship between the hardware resources used by each thread when the optimizing compiler is running.
For example, in response to a dependency relationship between P hardware resources used by a thread to perform a first operation, the P hardware resources include a target hardware resource, and P is a positive integer greater than 1. The compiling instruction information is used for indicating that after the usage data returned by the other P-1 hardware resources except the target hardware resource is obtained when the optimizing compiler is operated, the occupation right of the target hardware resource is allocated to the thread, or the occupation priority of the thread to the target hardware resource is improved.
For example, the dependency between the hardware resources used by each thread is obtained by the occupation of multiple hardware resources at the time of pipeline stall of each thread, where the multiple hardware resources include hardware resources other than the hardware resources that directly result in pipeline stall of the thread among the different hardware resources.
For example, the hardware resources in the processor include computing resources including computing logic units and address computing units, and storage resources including caches, memories, queues, and registers.
For example, a processor includes at least one physical core, with multiple threads running in the same physical core.
It should be noted that, the collecting module 101 is configured to implement step S10 shown in fig. 2, the generating module 102 is configured to implement step S20 shown in fig. 2, and the inserting module is configured to implement step S30 shown in fig. 2. Thus, the specific description of the collection module 101 may refer to the related description of step S10 shown in fig. 2 in the embodiment of the information processing method described above, the specific description of the generation module 102 may refer to the related description of step S20 shown in fig. 2 in the embodiment of the information processing method described above, and the specific description of the insertion module 103 may refer to the related description of step S30 shown in fig. 2 in the embodiment of the information processing method described above. In addition, the information processing apparatus may achieve similar technical effects to those of the foregoing information processing method, and will not be described herein.
The at least one embodiment of the present disclosure further provides a resource allocation device. Fig. 5 is a schematic block diagram of another resource allocation apparatus provided in at least one embodiment of the present disclosure.
For example, as shown in FIG. 5, the resource allocation apparatus 200 includes at least an allocation module 201.
For example, the allocation module 201 is configured to allocate hardware resources of the processor used when the processor executes the task according to the compiling instruction information, where the compiling instruction information is generated according to performance data generated by the processor in a process of executing a compiler corresponding to the task, and the processor executes the task by using multiple threads at the same time, and the performance data indicates a degree and/or a timing of a requirement of each of the multiple threads on different hardware resources in the processor.
For example, the allocation module 201 includes a second determination unit 2011 and a resource adjustment unit 2012.
The second determining unit 2011 is configured to determine the criticality of each hardware resource to each thread according to the compiling instruction information.
The resource adjustment unit 2012 is configured to adjust the occupation ratio and/or the dynamic use range of each thread to each hardware resource according to the criticality of each hardware resource to each thread.
For example, the resource adjusting unit 2012 performs adjustment of the occupation ratio and/or the dynamic use range of each thread to each hardware resource according to the criticality of each hardware resource to each thread, and includes performing operations of responding to any one hardware resource to the fact that the criticality of any one hardware resource to M threads in the plurality of threads is greater than or equal to a first threshold value, and the criticality of any one hardware resource to N threads in the plurality of threads is less than or equal to a second threshold value, reducing the allocation ratio of any one hardware resource to N threads, or limiting the dynamic use range of N threads to any one hardware resource, where M and N are positive integers.
For example, any one of the hardware resources includes a cache, and the resource adjustment unit 2012 performs adjustment of the occupation ratio and/or dynamic use range of each thread to each of the hardware resources according to the criticality of each of the hardware resources to each of the threads, including performing operations of reducing the allocation ratio of the cache to the N threads or limiting the N threads to select one cache line from the N occupied cache lines to perform cache replacement when the cache replacement occurs.
For example, any one of the hardware resources includes a branch target table, each of the branch target entries in the branch target table is used to store information about the branch jump, and the resource adjustment unit 2012 performs adjustment of the occupation ratio and/or dynamic use range of each of the threads to each of the hardware resources according to the criticality of each of the hardware resources, including performing operations of reducing the allocation ratio of the branch target table to N threads, or limiting the N threads from selecting one of the branch target entries occupied by the N threads to replace when applying for the branch target entry.
For example, any one of the hardware resources includes a first hardware resource, the first hardware resource being a hardware resource exclusively used by each of the plurality of threads, the resource adjustment unit 2012 executing an adjustment of a proportion of occupation of each of the threads to each of the hardware resources and/or a dynamic range of use according to a criticality of each of the hardware resources to each of the threads, including performing operations of allocating the first hardware resource to each of the at least one thread in accordance with an amount of occupation of the first hardware resource by each of the threads in response to the criticality of the first hardware resource to at least one of the plurality of threads being non-critical, wherein the amount of occupation of the first hardware resource by each of the threads is determined in a process of collecting performance data.
For example, the resource adjusting unit 2012 performs adjustment of the occupation proportion and/or the dynamic use range of each thread to each hardware resource according to the criticality of each hardware resource to each thread, including performing operations of, for any one hardware resource, responding to the criticality of any one hardware resource to M threads in the plurality of threads being greater than or equal to a first threshold, and the criticality of any one hardware resource to N threads in the plurality of threads being less than or equal to a second threshold, and increasing the allocation proportion of any one hardware resource to the M threads.
For example, the resource adjustment unit 2012 performs allocation of hardware resources of the processor for use when the processor performs tasks according to the compilation instruction information, including controlling the occupation timing or the resource occupation priority of each thread occupying the respective hardware resources according to the compilation instruction information.
For example, in response to the dependency relationship between P hardware resources used by the threads to perform the first operation, where P hardware resources include a target hardware resource, P is a positive integer greater than 1, the resource adjustment unit 2012 performs the operations of controlling, according to the compilation instruction information, an occupation opportunity or a resource occupation priority of each thread occupying each hardware resource, including performing the operations of allocating, according to the compilation instruction information, an occupation right of the target hardware resource to the threads after obtaining usage data returned by P-1 hardware resources other than the target hardware resource, or increasing an occupation priority of the threads on the target hardware resource.
It should be noted that the allocation module 201 is configured to implement step S40 shown in fig. 3. So that a specific explanation about the distribution module 201 can be referred to the related description of step S40 shown in fig. 3 in the above-described embodiment of the data communication method. In addition, the resource allocation device can achieve similar technical effects as the aforementioned resource allocation method, and will not be described in detail herein.
Fig. 6 is a schematic structural diagram of an electronic device according to at least one embodiment of the present disclosure. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
For example, as shown in fig. 6, in some examples, an electronic device 300 includes a processing means (e.g., a central processor, a graphics processor, etc.) 301 that may perform various appropriate actions and processes, such as performing an information processing method or a resource allocation method provided by at least one embodiment of the present disclosure, according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM303, various programs and data required for the operation of the computer system are also stored. The processing device 301, ROM302, and RAM303 are connected thereto via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
For example, components may be connected to I/O interface 305 including input devices 306 such as a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 307 such as a Liquid Crystal Display (LCD), speaker, vibrator, etc., storage devices 308 including, for example, magnetic tape, hard disk, etc., and communication devices 309 including network interface cards such as LAN cards, modems, etc. The communication means 309 may allow the electronic device 300 to perform wireless or wired communication with other devices to exchange data, performing communication processing via a network such as the internet. The drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 310, so that a computer program read therefrom is installed as needed into the storage device 309. While fig. 6 illustrates an electronic device 300 including various means, it is to be understood that not all illustrated means are required to be implemented or included. More or fewer devices may be implemented or included instead.
For example, the electronic device 300 may further include a peripheral interface (not shown), and the like. The peripheral interface may be various types of interfaces, such as a USB interface, a lightning (lighting) interface, etc. The communication means 309 may communicate with networks and other devices by wireless communication, such as the internet, intranets and/or wireless networks such as cellular telephone networks, wireless Local Area Networks (LANs) and/or Metropolitan Area Networks (MANs). The wireless communication may use any of a variety of communication standards, protocols, and technologies including, but not limited to, global System for Mobile communications (GSM), enhanced Data GSM Environment (EDGE), wideband code division multiple Access (W-CDMA), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), bluetooth, wi-Fi (e.g., based on the IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and/or IEEE 802.11n standards), voice over Internet protocol (VoIP), wi-MAX, protocols for email, instant messaging, and/or Short Message Service (SMS), or any other suitable communication protocol.
For example, the electronic device may be any device such as a mobile phone, a tablet computer, a notebook computer, an electronic book, a game console, a television, a digital photo frame, a navigator, or any combination of electronic devices and hardware, which is not limited in the embodiments of the present disclosure.
For example, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device 309, or installed from a storage device 308, or installed from a ROM 302. When the computer program is executed by the processing apparatus 301, the above-described information processing method or resource allocation method defined in the method of the embodiment of the present disclosure is executed.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be included in the electronic device or may exist alone without being incorporated into the electronic device.
Fig. 7 is a schematic diagram of a non-transitory computer readable storage medium according to at least one embodiment of the present disclosure. For example, as shown in FIG. 7, one or more computer-executable instructions 401 may be stored non-transitory on the storage medium 400. For example, the computer executable instructions 401, when executed by a processor, may perform one or more steps in accordance with the information processing method described above, or perform one or more steps in accordance with the resource allocation method described above.
For example, the storage medium 400 may be applied to the above-described electronic device. For example, the storage medium 400 may include a memory 1003 in an electronic device.
For example, the description of the storage medium 400 may refer to the description of the memory in the embodiment of the electronic device, and the repetition is omitted.
Those skilled in the art will appreciate that various modifications and improvements can be made to the disclosure. For example, the various devices or components described above may be implemented in hardware, or may be implemented in software, firmware, or a combination of some or all of the three.
Further, while the present disclosure makes various references to certain elements in a system according to embodiments of the present disclosure, any number of different elements may be used and run on a client and/or server. The units are merely illustrative and different aspects of the systems and methods may use different units.
A flowchart is used in this disclosure to describe the steps of a method according to an embodiment of the present disclosure. It should be understood that the steps that follow or before do not have to be performed in exact order. Rather, the various steps may be processed in reverse order or simultaneously. Also, other operations may be added to these processes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the methods described above may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiment may be implemented in the form of hardware, or may be implemented in the form of a software functional module. The present disclosure is not limited to any specific form of combination of hardware and software.
Unless defined otherwise, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.