CN100495319C

CN100495319C - Method and apparatus for reading misaligned data in a processor

Info

Publication number: CN100495319C
Application number: CNB2003101224285A
Authority: CN
Inventors: 梁伯嵩
Original assignee: Sunplus Technology Co Ltd
Current assignee: Sunplus Technology Co Ltd
Priority date: 2003-12-23
Filing date: 2003-12-23
Publication date: 2009-06-03
Anticipated expiration: 2023-12-23
Also published as: CN1632741A

Abstract

The invention provides a device and a method for reading unaligned data in a processor, wherein the unaligned data is stored in a storage device, and a reading combination temporary storage device is coupled to the storage device to temporarily store the read data; a shift device coupled to the read combining register and the storage device for shifting the read combining register and the storage device according to the storage address; the control device captures a first character and temporarily stores the first character in the reading combination register, and captures a second character, the shifting device connects the first character and the second character in series and shifts the first character to a first position, the control device captures a third character, and the shifting device connects the second character and the third character in series and shifts the second character and the third character to the first position.

Description

Read the method and apparatus of unjustified data in the processor

Technical field

The invention relates to the technical field of Data Processing, refer to read in a kind of processor the framework and the method thereof of unjustified data especially

Background technology

When processor carried out Data Processing, whether the alignment of data closes was usefulness to many key operations, for example the usefulness of computing such as word string, array.As shown in Figure 1, a data (ABCDEFGHIJKL) that needs to handle is often crossed over the data storage border, when a processor carries out word string or array operation to this document, need to carry out earlier many extra computings, so that after can be with this document being reduced into the form of alignment, this processor could be to the document utilization of being correlated with.

At the unjustified problem of processing data, a kind of known method is after data is written into processor, utilizes various processor instructions to operate again and obtain needed data.As shown in Figure 2, the data (ZABC) that will be arranged in the 100h place earlier is written into working storage R16, working storage R16 is moved to left 8 bits so that unwanted data (Z) is removed, the data (DEFG) that will be arranged in the 104h place again is written into working storage R17, and working storage R17 moved to right 24 bits so that unwanted data (EFG) is removed, at last with working storage R16 and working storage R17 carries out or (OR) computing and its result deposited to working storage R16, the content among this moment working storage R16 is the data (ABCD) of required processing.According to above-mentioned same steps as, data EFGH and IJKL are written among working storage R17 and the working storage R18 in regular turn.

As shown in the above description, if the required unjustified data length that is written into is n word group (a word group is 32 bits), the known formula rule need then need 5n instruction to describe and read action, needs 5n instruction cycle just can finish at least simultaneously and reads action.This makes procedure code tediously long, occupies the storage area, and the burden that also increases processor simultaneously makes processor efficient unclear.

Use processor instruction to handle the problem of the tediously long and efficient of procedure code that unjustified data causes at known method, in U.S. USP4,814, in No. 976 patent case bulletins, be to be written into the action that unjustified data is promptly alignd simultaneously, and, be divided into twice and read a document of crossing the boundary.As shown in Figure 3, the data (ABC) that will be arranged in 101h to 103h place earlier is written into the

bit group

0,1,2 of working storage R16, this moment working storage R16 bit group 3 in data be X (don ' t care), the data (D) that will be arranged in the 104h place again is written into the bit group 3 of R16, and the content among the working storage R16 is the data (ABCD) of required processing at this moment.Same steps as is written into data EFGH and IJKL among working storage R17 and the working storage R18 in regular turn according to this.

As shown in the above description,, then need 2n instruction to describe and read action, need 2n instruction cycle just can finish at least simultaneously and read action if the required unjustified data length that is written into is n word group.And, processor pipeline sluggishness (Stall) possibility is improved because same storer and working storage position are made repetitive read-write.Same memory location is repeated to read, can waste the bus-bar frequency range, especially in some system that does not have memory cache, the delay that is caused is obvious especially.

The inventor whence originally in the spirit of positive invention, is urgently thought a kind of " with the method and framework that reads unjustified data in the processor " that can address the above problem because of in this, and several times research experiment is eventually to finishing the present invention.

Summary of the invention

The objective of the invention is is providing a kind of method and framework that reads unjustified data in processor, could describe the problem that read action because of using than multiple instruction to avoid known technology, and reduce the instruction cycle simultaneously and read action to finish, and raising execution efficient.

According to a characteristic of the present invention, read the method for unjustified data in a kind of processor of the present invention, it is characterized in that, wherein, unjustified data is an address that is stored in a memory storage, and this memory storage has the word group of a plurality of m bits of being separated by block boundary, and this unjustified data is divided into first, second portion and third part by block boundary, when this second portion comprises length is that the block count amount of m is non-greater than 1 the time, and this method mainly comprises:

One initial acquisition step is to carry out one first instruction, with by this memory storage place that comprises this first, captures one first word group;

One relaying acquisition step is to carry out one second instruction, with by this memory storage place that comprises this second portion, captures one second word group;

One first shift step is that this first word group is connected in series with this second word group, and is displaced to one first working storage, and wherein, the displacement bit number of this first displacement step is relevant with this unjustified data this address in this memory storage with the value of m;

One finishes the acquisition step, is to carry out one the 3rd instruction, with by this memory storage place that comprises this third part, captures one the 3rd word group; And

One second shift step is that this second word group is connected in series with the 3rd word group, and is displaced to one first working storage, and wherein, the displacement bit number of this second displacement step is relevant with this unjustified data this address in this memory storage with the value of m;

Wherein, this first and this third part are less than the m bit, and this second portion is to equal the m bit.

Wherein, when this second portion comprise length be the block count amount of m greater than 1 the time, this method also comprises:

One continuous relaying acquisition step is that this second word group is replaced this first word group, and carries out one second instruction, with by this memory storage place that comprises this second portion, captures one second word group;

One the 3rd shift step is that this first word group is connected in series with this second word group, and is displaced to one the 3rd working storage.

Wherein shift step is a translation mode.

Wherein shift step is a rotation mode.

Wherein, m is 32.

Wherein, when this address was 4N+1, this first shift step, this second shift step and the 3rd shift step were to 8 bits that shift left, and N is a non-negative integer.

Wherein, when this address was 4N+2, this first shift step, this second shift step and the 3rd shift step were to 16 bits that shift left, and N is a non-negative integer.

Wherein, when this address was 4N+3, this first shift step, this second shift step and the 3rd shift step were to 24 bits that shift left, and N is a non-negative integer.

According to a characteristic of the present invention, read the method for unjustified data in a kind of processor of the present invention, wherein, unjustified data is an address that is stored in a memory storage, this memory storage has the word group of a plurality of m bits of being separated by block boundary, this unjustified data is divided into first and second portion by block boundary, it is characterized in that, this method mainly comprises:

One finishes the acquisition step, is to carry out one second instruction, with by this memory storage place that comprises this second portion, captures one second word group; And

Wherein, this first and this second portion are less than the m bit.

Wherein shift step is a translation mode.

Wherein shift step is a rotation mode.

Wherein, m is 32.

Wherein, when this address was 4N+1, this first shift step was to 8 bits that shift left, and N is a non-negative integer.

Wherein, when this address was 4N+2, this first shift step was to 16 bits that shift left, and N is a non-negative integer.

Wherein, when this address was 4N+3, this first shift step was to 24 bits that shift left, and N is a non-negative integer.

According to a characteristic of the present invention, read the device of unjustified data in a kind of processor of the present invention, it is characterized in that, wherein, unjustified data is to be stored in a memory storage, it is divided into first, second portion and third part by block boundary, and this memory storage has the word group of a plurality of m bits of being separated by block boundary, and this device mainly comprises:

One reads the combination working storage, and it is to be coupled to this memory storage, to keep in the data by this memory storage was read;

One shift unit, it is to be coupled to this to read combination working storage and this memory storage, with the storage address according to this unjustified data, makes up working storage and this memory storage is shifted and this is read; And

One control device is by this memory storage place that comprises this first, captures one first word group and temporary reads the combination working storage to this; This memory storage place by comprising this second portion captures one second word group, and with this shift unit this first word group is connected in series with this second word group, and is displaced to one first working storage; When this second portion comprises length is that the block count amount of m is greater than 1 the time, this second word group is replaced the first word group, and this memory storage place that comprises this second portion captures one second word group and also this first word group is connected in series with this second word group, and is displaced to one the 3rd working storage; By this memory storage place that comprises this third part, capture one the 3rd word group, and this second word group is connected in series with the 3rd word group, and be displaced to one second working storage again with this shift unit;

Wherein the displacement mode of shift unit is the translation displacement.

Wherein the displacement mode of shift unit is a rotation displacement.

Wherein, m is 32.

Wherein, when this address was 4N+1, this shift unit was to 8 bits that shift left, and N is a non-negative integer.

Wherein, when this address was 4N+2, this shift unit was to 16 bits that shift left, and N is a non-negative integer.

Wherein, when this address was 4N+3, this shift unit was to 24 bits that shift left, and N is a non-negative integer.

Because modern design of the present invention can provide on the industry and utilize, and truly have the enhancement effect, so apply for patent of invention in accordance with the law.

Description of drawings

For further specifying concrete technology contents of the present invention, below in conjunction with embodiment and accompanying drawing describes in detail as after, wherein:

Fig. 1 is one group of synoptic diagram that unjustified data is arranged in storer.

Fig. 2 is the procedure code that known technology is written into one group of unjustified data.

Fig. 3 is that another known technology is written into the procedure code of one group of unjustified data and the synoptic diagram of working storage.

Fig. 4 is the calcspar that reads unjustified data in the processor of the present invention.

Fig. 5 is the order format of the technology of the present invention.

Fig. 6 is the synoptic diagram of the LCB instruction of the technology of the present invention.

Fig. 7 is the synoptic diagram of the LCW instruction of the technology of the present invention.

Fig. 8 is the synoptic diagram of the LCE instruction of the technology of the present invention.

Fig. 9 is LCB, the LCW of the technology of the present invention and the execution situation of LCE instruction.

Figure 10 is an exemplary applications of the technology of the present invention.

Figure 11 is the Another Application example of the technology of the present invention.

Embodiment

Fig. 4 shows the calcspar that reads the device of unjustified data in the processor of the present invention, and it mainly comprises a memory storage 100, and reads combination working storage 200 (Load Combine Register, LDCR), a shift unit 300 and a control device 400.Wherein, this memory storage 100 has the word group of a plurality of m bits of being separated by block boundary, and in present embodiment, m is preferably 32 bits, that is this memory storage 100 is made up of the word group of a plurality of 32 bits.This unjustified data (ABCDEFGHIJKL) is to be stored in this memory storage 100, and is divided into first 110, second portion 120 and third part 130 by block boundary.

This reads combination working storage 200 is to be coupled to this memory storage 100, with the temporary data that is read by this memory storage 100.It is to be coupled to this to read combination working storage 200 and this memory storage 100 for this shift unit 300, with the storage address according to this unjustified data, makes up working storage 200 and this memory storage 100 is shifted and this is read.The displacement mode of this shift unit can be translation (Shift) or rotation (Rotate) mode.

This control device 400 is by these memory storage 110 places that comprise this first, captures also temporary the reading to this of one first word group and makes up working storage 200; And these memory storage 120 places by comprising this second portion capture one second word group, and with this shift unit 300 this first word group are connected in series with this second word group, and are displaced to primary importance; By these memory storage 130 places that comprise this third part, capture one the 3rd word group, and this second word group is connected in series with the 3rd word group, and be displaced to primary importance again with this shift unit 300.

Read in the device of unjustified data three instructions of definition in the processor of the present invention to allow this control device 400 produce relevant controlling signal.These three instructions are respectively and are written into combination initial order (Load Combine Begin, LCB), are written into combined characters group instruction (Load Combine Word, LCW) and are written into combination END instruction (Load Combine End, LCE).Its form as shown in Figure 5.

LCB[Addr] instruction is to be that the memory content at Addr place is written into this and reads and make up in the working storage 200 (LDCR) with the address, as shown in Figure 6, LCB[101] be that content (ABC) with storer 101 places is written among this LDCR200.

LCW rD, [Addr] instruction is to be that the memory content at Addr place reads with this and makes up the middle content of working storage 200 (LDCR) and combine with the address as shown in Figure 7, and according to this Addr after shifting left, write among this working storage rD, and be that the memory content at Addr place is written into this and reads in the combination working storage 200 (LDCR) with the address, wherein, when this Addr is 4N, then be not shifted, when this Addr is 4N+1, then to 8 bits that shift left, when this Addr is 4N+2, then to 16 bits that shift left, when this Addr is 4N+3, then to 24 bits that shift left.

LCE rD, [Addr] instructs as shown in Figure 8.Wherein, when this Addr is 4N, will makes up in the working storage 200 (LDCR) content and write direct among the working storage rD, but not be that the memory content at Addr place is written into this and reads and make up in the working storage 200 (LDCR) the address.When this Addr is not 4N, be to be that the memory content at Addr place reads with this that content combines in combination working storage 200 (LDCR) with the address, and according to this Addr after shifting left, write among this working storage rD, and be that the memory content at Addr place is written into this and reads in the combination working storage 200 (LDCR) with the address, wherein, when this Addr is 4N+1, then to 8 bits that shift left, when this Addr is 4N+2, then to 16 bits that shift left, when this Addr is 4N+3, then to 24 bits that shift left.

Fig. 9 then is this LCB[Addr], LCW rD, [Addr] and LCE rD, the various execution situations of [Addr] instruction under two kinds of data arrangement modes (little endian, big endian), wherein, in this data that reads in the combination working storage 200 (LDCR) is abcd, data in storer is ABCD, it is 4N that s=0 represents institute access memory address, it is 4N+1 that s=1 represents institute access memory address, it is 4N+2 that s=2 represents institute access memory address, and it is 4N+3 that s=3 represents institute access memory address.In Fig. 9, give an example with 4N=100.

Figure 10 is the synoptic diagram that shows utilization of the present invention, when desire is loaded into one group of unjustified data (ABCDEFGHIJKL) among working storage R16, R17 and the R18, it is to carry out a LCB[101h earlier] instruction, with with in this memory storage 100 be arranged in the data that the address is the 101h place (ZABC) be written into earlier this read the combination working storage 200 (LDCR), execute this LCB[101h] instruction after, this read the combination working storage 200 content be ZABC ([LDCR]=ZABC).Carry out a LCW R16 again, [105h] instruction, read with this will comprise the memory content that the address is the 105h place (DEFG) that content (ZABC) is combined into ZABCDEFG in combination working storage 200 (LDCR), and according to this address (105h) high 32 bits (ABCD) behind 8 bits that shift left, write among this working storage R16, and will to comprise the address be that the memory content at 105h place is written into this and reads in the combination working storage 200 (LDCR), so when executing this LCW R16, after [105h] instruction, the content of working storage R16 is ABCD, and this content that reads combination working storage 200 is DEFG ([LDCR]=DEFG).

Thereafter, carry out a LCW R17 again, [109h] instruction, read with this will comprise the memory content that the address is the 109h place (HIJK) that content (DEFG) is combined into DEFGHIJK in combination working storage 200 (LDCR), and according to this address (109h) high 32 bits (EFGH) behind 8 bits that shift left, write among this working storage R17, and will to comprise the address be that the memory content at 109h place is written into this and reads in the combination working storage 200 (LDCR), so when executing this LCW R17, after [109h] instruction, the content of working storage R17 is EFGH, and this content that reads combination working storage 200 is HIJK ([LDCR]=HIJK).

At last, carry out this LCE R18, [10Dh] instruction, read the middle content (HIJK) of combination working storage 200 (LDCR) with the memory content (LZZZ) that the address is included as the 10Dh place with this and be combined into HIJKLZZZ, and according to this address (109h) high 32 bits (IJKL) behind 8 bits that shift left, write among this working storage R18, and the memory content that will comprise the address and be the 10Dh place is written into this and reads in the combination working storage 200 (LDCR), so when executing this LCE R18, after [10Dh] instruction, the content of working storage R18 is IJKL, and this content that reads combination working storage 200 is LZZZ ([LDCR]=LZZZ).

Figure 11 is the synoptic diagram that shows another utilization of the present invention, when desire is loaded into one group of unjustified data (ABCD) among the working storage R16, it is to carry out a LCB[101h earlier] instruction, with with in this memory storage 100 be arranged in the data that the address is the 101h place (ZABC) be written into earlier this read the combination working storage 200 (LDCR), execute this LCB[101h] instruction after, this read the combination working storage 200 content be ZABC ([LDCR]=ZABC).

Carry out a LCE R16 again, [105h] instruction, read the middle content (ZABC) of combination working storage 200 (LDCR) with the memory content (D) that the address is included as the 105h place with this and be combined into ZABCDZZZ, and according to this address (105h) high 32 bits (ABCD) behind 8 bits that shift left, write among this working storage R16, and will to comprise the address be that the memory content at 105h place is written into this and reads in the combination working storage 200 (LDCR), so when executing this LCE R16, after [105h] instruction, the content of working storage R16 is ABCD, and this content that reads combination working storage 200 is DZZZ ([LDCR]=DZZZ).

By above-mentioned explanation as can be known, if the required unjustified data length that reads is n word group, technology of the present invention only needs (n+1) individual instruction just can describe and reads action, not only can shorten procedure code, only need (n+1) individual instruction cycle just can finish simultaneously and read action, also significantly improve and carry out efficient.And can not make repetitive read-write to same storer and working storage position, and processor pipeline sluggishness (Stall) possibility is reduced, because same memory location is only done once necessary reading, save bus bandwidth, bus bandwidth is used can be reached optimization.

To sum up institute is old, and no matter the present invention all is different from the feature of known technology with regard to purpose, means and effect, in fact is one to have the invention of practical value.Only it should be noted that above-mentioned many embodiment give an example for convenience of explanation, the interest field that the present invention advocated should be as the criterion so that claim is described certainly, but not only limits to the foregoing description.

Claims

1. A method for reading unaligned data in a processor, wherein the unaligned data is stored at an address of a storage device, and the storage device has a plurality of m bits separated by word group boundaries Word group, the unaligned data is divided into the first part, the second part and the third part by the word group boundary, when the number of the word groups whose length is m in the second part is not greater than 1, the method mainly includes:

An initial retrieval step is executing a first command to retrieve a first word from the storage device including the first portion;

a relay retrieving step is to execute a second command to retrieve a second word from the storage device including the second part;

A first shifting step is to concatenate the first word group with the second word group and shift to a first temporary register, wherein the number of shifted bits in the first shifting step is equal to m a value is associated with the address of the unaligned data in the storage device;

an end of the retrieving step is to execute a third instruction to retrieve a third word from the storage device including the third part; and

A second shifting step is to concatenate the second word group with the third word group and shift to a first temporary register, wherein, the number of shifted bits in the second shifting step is equal to m a value is associated with the address of the unaligned data in the storage device;

Wherein, the first part and the third part are less than m bits, and the second part is equal to m bits.

2. The method for reading unaligned data in a processor as claimed in claim 1, wherein, when the second part includes a length of m word groups greater than 1, the method further comprises:

a sequential relay retrieval step of replacing the first word with the second word and executing a second command to retrieve a second word from the storage device including the second portion;

A third shifting step is to concatenate the first word group with the second word group and shift to a third register.

3. The method for reading unaligned data in a processor according to claim 1, wherein the shifting step is a translation method.

4. The method for reading unaligned data in a processor according to claim 1, wherein the shifting step is a rotation method.

5. The method for reading unaligned data in a processor according to claim 2, wherein m is 32.

6. The method for reading unaligned data in a processor according to claim 5, wherein, when the address is 4N+1, the first shifting step, the second shifting step and the The third shift step is to shift 8 bits to the left, and N is a non-negative integer.

7. The method for reading unaligned data in a processor according to claim 5, wherein, when the address is 4N+2, the first shifting step, the second shifting step and the The third shift step is to shift left by 16 bits, and N is a non-negative integer.

8. The method for reading unaligned data in a processor according to claim 5, wherein, when the address is 4N+3, the first shift step, the second shift step and the The third shift step is to shift left by 24 bits, and N is a non-negative integer.

9. A method for reading unaligned data in a processor, wherein the unaligned data is stored at an address of a storage device, the storage device has a plurality of m-bit word groups separated by word group boundaries, the The unaligned data is divided into a first part and a second part by a word boundary, and it is characterized in that the method mainly includes:

an end of the retrieving step is to execute a second command to retrieve a second word from the storage device including the second part; and

Wherein, the first part and the second part are less than m bits.

10. The method for reading unaligned data in a processor according to claim 9, wherein the shifting step is a translation method.

11. The method for reading unaligned data in a processor according to claim 9, wherein the shifting step is a rotation method.

12. The method for reading unaligned data in a processor according to claim 9, wherein m is 32.

13. The method for reading unaligned data in a processor according to claim 12, wherein, when the address is 4N+1, the first shift step is to shift left by 8 bits, N is a non-negative integer.

14. The method for reading unaligned data in a processor according to claim 12, wherein, when the address is 4N+2, the first shift step is to shift left by 16 bits, N is a non-negative integer.

15. The method for reading unaligned data in a processor according to claim 12, wherein, when the address is 4N+3, the first shift step is to shift left by 24 bits, N is a non-negative integer.

16. A device for reading unaligned data in a processor, wherein the unaligned data is stored in a storage device, which is divided into a first part, a second part and a third part by a word group boundary, the The storage device has a plurality of m-bit word groups separated by word group boundaries, and the device mainly includes:

a read combination register, which is coupled to the storage device to temporarily store data read by the storage device;

a shifting device, coupled to the read combination register and the storage device, for shifting the read combination register and the storage device according to the storage address of the misaligned data; and

A control device, from the storage device including the first part, retrieves a first word group and temporarily stores it in the read combined temporary register; from the storage device including the second part, retrieves a The second word group, and the first word group and the second word group are concatenated by the shifting device, and shifted to a first temporary register; when the second part includes the number of word groups whose length is m When greater than 1, replace the first word group with the second word group, and at the storage device including the second part, retrieve a second word group and concatenate the first word group with the second word group , and shifted to a third temporary register; then fetch a third word group from the storage device including the third part, and use the shift device to combine the second word group with the third word The groups are concatenated and shifted to a second register;

17. The device for reading unaligned data in a processor according to claim 16, wherein the shifting means of the shifting means is translational shifting.

18. The device for reading unaligned data in a processor according to claim 16, wherein the shifting means of the shifting means is a rotational shift.

19. The device for reading unaligned data in a processor according to claim 16, wherein m is 32.

20. The device for reading unaligned data in a processor according to claim 19, wherein, when the address is 4N+1, the shifting device shifts 8 bits to the left, and N is one non-negative integer.

21. The device for reading unaligned data in a processor according to claim 19, wherein, when the address is 4N+2, the shifting device shifts 16 bits to the left, and N is one non-negative integer.

22. The device for reading unaligned data in a processor according to claim 19, wherein, when the address is 4N+3, the shifting device shifts 24 bits to the left, and N is one non-negative integer.