Background
With the rapid development of informatization and the large-scale popularization of computers, computers have been widely used in various aspects of society such as military, education, finance, scientific research, etc. Meanwhile, the safety problem of the computer is endless, and the national safety, economy and the like are seriously affected. For example, the Red Code Red virus, which erupts in month 7 of 2001, attacks many servers on a large scale, and the attacked server sends a large amount of data to government websites according to the virus instructions, which ultimately results in website paralysis, bringing about a loss of 26 billion dollars worldwide. To date, month 1 and 1 of 2020, microsoft security response center (Microsoft Security Response Center, MSRC for short) reported 37964 holes (Bug) in total, of which 5264 holes with high severity level were at risk. Another report on MSRC indicates that 70% of the new holes in the CVE (Common Vulnerabilities & expose) dictionary that disclose holes are holes for memory security issues each year.
In the memory security problem, the damage to the integrity of the sensitive memory (including sensitive data and sensitive codes) poses a serious threat to the security of the system.
Many defense mechanisms for defending against memory corruption attacks can work normally on the premise of guaranteeing the integrity of sensitive data, such as a security area and a security stack of a code pointer integrity mechanism (Code Pointer Integrity, abbreviated as CPI mechanism), a shadow stack of a shadow stack defense mechanism (shadow stack), metadata of a defense mechanism CFIXX for guaranteeing the integrity of object types in c++, and the like. A writable and executable mutex mechanism (Write XOR Execute, W-X mechanism) that resists code injection attacks prevents memory pages from having both write and execute rights. However, dynamic Code generation techniques widely used In Just-In-Time (JIT compiler) and dynamic binary translation dynamically generate and modify Code and store it In a Code Cache. Since sensitive code is located in the code cache, the integrity of the code cache needs to be protected.
The in-process isolation mechanism is an important means for ensuring the safety of the system, and ensures that an attacker cannot execute sensitive codes and access sensitive data even if the attacker attacks the user process. The method based on in-process isolation is the main stream direction of current academic research, and the method of in-process isolation can be divided into three types, including an address-based isolation method, a domain-based isolation method and a privilege access-based isolation method, and is introduced as follows:
1. Address-based isolation methods. The address-based isolation method needs to insert each access instruction, restricts the address range which can be accessed by the instructions, and ensures that a safe area cannot be accessed. In a technical scheme of a software-only address isolation method, such as software error isolation (Software Fault Isolation, abbreviated as SFI), codes and data are divided into different areas, and codes in each area can only access corresponding data. Since SFI is implemented in a pure software approach, it introduces a significant performance overhead to accessing memory-intensive programs. To speed up the address isolation method, intel (Intel) introduced memory protection extension hardware (Memory Protection Extensions, abbreviated MPX) to speed up boundary checking, MPX allows a programmer to create a set of boundaries that identify the upper and lower bounds of an address interval. All access instructions are inserted, and MPX hardware is utilized to check whether the access address falls in the safe area. Since the address isolation-based method checks whether each access instruction accesses critical data before executing it, protecting access-intensive programs introduces a huge performance overhead, which is also the performance bottleneck.
2. Domain-based isolation methods. The basic idea of the domain isolation method is to open the access right of the security area before accessing the security area, and close the access right immediately after the access is completed, so that even if an attacker knows the location of the security area, the attacker cannot access the key data. The key data protected by the information hiding technology is generally frequently accessed by defense mechanisms, such as a code pointer integrity technology, a control flow integrity technology, a shadow stack technology and the like, and the defense mechanisms access a security area during function return, function call and control flow indirect jump. Taking the SPEC CPU2006 benchmark test set as an example, the frequency of execution of function call and function return instructions averages about 5800 tens of thousands of times per second, and the frequency of execution of indirect jump instructions averages about 4300 tens of thousands of times per second. The performance bottleneck of the domain isolation method is that it requires frequent switching of access rights. The domain isolation method of pure software modifies its access rights before and after accessing the secure area, e.g. using Mprotect system calls. Because the user mode and the kernel mode need to be switched when the system call is executed once, about 2 ten thousands of clock cycles are needed, and therefore, very large performance overhead is introduced when the access permission is frequently switched.
In order to accelerate domain-based isolation, and to increase the speed of switching access rights, some researchers have proposed isolating secure regions using extended page table technology (Extended Page Table, abbreviated as EPT technology) in hardware-assisted memory virtualization. The method sets two extended page tables (EPT for short), one records the address mapping relation of the safe area (called safe EPT), and the other records the address mapping relation of the unsafe area (called unsafe EPT). The vmfunc instruction (about 140 clock cycles) provided by intel is then used to rapidly switch between the two EPTs for isolation purposes. In addition, intel memory protection key hardware (Intel Memory Protection Keys, MPK) can be used to isolate the security area. MPK can divide the user memory space into 16 areas, identify the area to which the page belongs by 4 bits in the page table item, and add PKRU registers to control the read-write authority of each area, thus achieving the isolation purpose.
3. Isolation methods based on privileged access. Still other studies protect the secure enclave by adding new hardware to the processor. For example, some researchers have added 1 bit to the page table entry to identify whether the page is a sensitive data page, and extended the X86 instruction set, providing a dedicated memory access instruction smov to access the sensitive data page (abbreviated IMIX mechanism). Similarly, the MicroStache mechanism also employs design considerations similar to the IMIX mechanism, except that further isolation is also done on the cache to block potential cache-based side channel attacks. The IMIX mechanism and the MicroStache mechanism can realize the protection of the security area by only setting the page where the security area is located as a protected page and then accessing the area through a proprietary access instruction.
In summary, in view of the existing work, the memory isolation method still has the problem of high performance overhead, and becomes an obstacle for large-scale deployment. The main problem with the approach of adding hardware is that there is no real hardware support and therefore it cannot be deployed immediately into the system to protect the integrity and confidentiality of the secure enclave.
Because of the high performance overhead of the existing software implementation method, in order to protect sensitive memory, intel provides a Control-flow Enforcement Technology (CET mechanism) technology, including a new hardware shadow stack mechanism cet.shstk (SHSTK mechanism) in the latest processor and a protection mechanism (IBT mechanism) of the new hardware implementation for the coarse-grained CFI (Control flow Integrity) of the forward edge. The SHSTK mechanism is an important and effective defense mechanism for preventing ROP attacks, and can ensure that return addresses on stacks are not tampered by attackers. While the program execution Call instruction pushes the return address to the main stack, the SHSTK mechanism pushes the return address to the hardware shadow stack pointed by the SSP register (the page where the return address is the shadow stack page), and while the program execution Ret instruction, the SHSTK mechanism compares whether the return addresses on the main stack and the hardware shadow stack are consistent or not, and if not, the exception of #GP is thrown. The normal read instruction can read the shadow stack page, the normal write instruction cannot write the shadow stack page, otherwise, an exception is triggered, and only the WRSS instruction can write the shadow stack page.
However, write overhead of WRSS instruction is large, so that isolation overhead in the prior art is too large, and because CET mechanism involves various tasks, direct adjustment of CET mechanism is difficult to realize.
Disclosure of Invention
It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art and to provide a method for protecting the integrity of a general-purpose memory based on the Intel CET mechanism.
The invention aims at realizing the following technical scheme:
According to the first aspect of the invention, a method for protecting the integrity of a general memory based on an Intel CET mechanism is provided, which comprises the steps of setting a page where sensitive data and/or sensitive codes to be protected are located as a special shadow stack page when a program is executed, enabling the special shadow stack page and the shadow stack page maintained by the CET mechanism to be independent of each other, carrying out overhead reduction processing adapted to the content to be written which needs to reduce writing overhead before writing operation is carried out on the special shadow stack page, writing the content to be written which is subjected to the overhead reduction processing into the special shadow stack page through a WRSS instruction of the CET mechanism, and protecting the integrity of the sensitive data and/or the sensitive codes by utilizing the special shadow stack page.
In some embodiments of the present invention, the step of performing the overhead reduction process adapted to the content to be written that needs to reduce the writing overhead includes performing a lossless compression process on the content to be written when the data size of the content to be written exceeds a predetermined threshold, where the memory size actually occupied by the content to be written is reduced by using a plurality of bits in the address space that are not used by the addressing process.
In some embodiments of the present invention, the content to be written to reduce writing overhead includes metadata of a sensitive pointer, where the metadata of the sensitive pointer includes a value of the sensitive pointer, an upper bound and a lower bound of an object to which the metadata points, and the step of performing lossless compression processing on the content to be written includes calculating a first difference value and a second difference value according to the metadata of the sensitive pointer, where the first difference value is a difference value obtained by subtracting a lower bound of an object to which the value of the sensitive pointer points from the value of the sensitive pointer, and the second difference value is a difference value obtained by subtracting the value of the sensitive pointer from the upper bound of the object to which the sensitive pointer points, and using a plurality of bits not used by an addressing process and a plurality of bits used by the addressing process in an address space to store the value of the sensitive pointer, the first difference value and the second difference value.
In some embodiments of the present invention, the writing the content to be written after the overhead reduction processing into the dedicated shadow stack page through the WRSS instruction of the CET mechanism includes writing the value, the first difference value and the second difference value of the sensitive pointer into the specified bits in the dedicated shadow stack page according to the data writing rule corresponding to the class according to the different classes of the sensitive pointer, and using the corresponding bits in the address space, which are not used by the addressing process, as extension class indication bits, to record the class of the sensitive pointer.
In some embodiments of the present invention, the step of protecting the integrity of the sensitive data and/or the sensitive code by using the dedicated shadow stack page further includes, before dereferencing the corresponding pointer stored in the normal memory, determining whether the dereferencing is safe according to metadata of the sensitive pointer stored in the dedicated shadow stack page for backup of the pointer.
In some embodiments of the invention, the method further comprises directly writing the content to be written that does not require a reduction in write overhead to a dedicated shadow stack page via a WRSS instruction of the CET mechanism.
In some embodiments of the present invention, the content to be written without reducing writing overhead includes a metadata table for recording virtual table pointers, and the step of protecting the integrity of the sensitive data and/or the sensitive code by using the dedicated shadow stack page further includes comparing the virtual table pointers with virtual table pointers recorded in the metadata table in the dedicated shadow stack page before performing indirect call on the target function according to corresponding virtual table pointers stored in the common memory, and determining whether the indirect call is safe.
In some embodiments of the present invention, the step of performing the overhead reduction processing adapted to the content to be written that needs to reduce the writing overhead before performing the writing operation on the dedicated shadow stack page includes, when the data size of the content to be written is smaller than a predetermined threshold, occupying a reserved register to temporarily store the content to be written and waiting for storing other content to be written that has the data size smaller than the predetermined threshold, and performing writing by a WRSS instruction until the total data size of the content to be written in the reserved register is greater than or equal to the predetermined threshold.
In some embodiments of the present invention, the content to be written includes a machine code generated by a JIT compiler, the machine code is a sensitive code, and the step of performing, before performing a write operation on a dedicated shadow stack page, an overhead reduction process adapted to the content to be written, which needs to reduce writing overhead, includes storing the corresponding machine code in a reserved register according to a generated order, until the total amount of data of the content to be written reaches a predetermined threshold value, and performing writing through a WRSS instruction.
According to a second aspect of the present invention, there is provided a method of protecting program security based on the Intel CET mechanism, the method comprising obtaining program source code, compiling the program source code with a compiler to protect the integrity of sensitive data and/or sensitive code as described in the first aspect when the program is executed.
In some embodiments of the present invention, the step of compiling the program source code with the compiler includes inserting corresponding protection logic code according to the sensitive data and/or the information of the sensitive code to be protected in the program source code, so that the compiled program, when executed, protects the integrity of the sensitive data and/or the sensitive code by the corresponding protection logic code according to the method of the first aspect.
According to a third aspect of the present invention there is provided an electronic device comprising one or more processors and memory, wherein the memory is for storing executable instructions, the one or more processors being configured to implement the method of the first and/or second aspects via execution of the executable instructions.
Detailed Description
For the purpose of making the technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by way of specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As mentioned in the background section, the write overhead of WRSS instructions is large, making the isolation overhead of the prior art too large, and because the Intel CET mechanism involves multiple tasks, directly adjusting the Intel CET mechanism is difficult to implement. Therefore, the invention realizes the integrity protection of the general memory based on the Intel CET mechanism, and in order to be compatible with the Intel CET mechanism and not conflict with a shadow stack page maintained by the Intel CET mechanism, the invention sets a special shadow stack page which is independent of the shadow stack page maintained by the Intel CET mechanism and performs the cost reduction processing adapted to the content to be written which is written into the special shadow stack page and needs to reduce the writing cost, thereby reducing the times of using WRSS instructions, protecting the integrity of sensitive data and/or sensitive codes under the condition of using lower cost, reducing the performance cost of a processor in the aspect of protecting the integrity of the general memory, and further improving the processing efficiency of the processor on other tasks.
Before describing embodiments of the present invention in detail, some of the terms used therein are explained as follows:
WRSS instructions are instructions in Intel CET technology that modify the contents of shadow stack pages. The Intel CET technique may maintain a shadow stack in memory space for the corresponding thread (the thread to be protected), see memory space shown in fig. 1, where the base address of the shadow stack is stored in SSP (Shadow Stack Pointer) registers. The shadow stack is made up of shadow stack pages. The normal write instruction has no write authority of the shadow stack page, and the WRSS instruction has write authority of the shadow stack page. Compared with a common memory access instruction, the write once memory of the WRSS instruction has a large time cost, specifically, the execution of one mov instruction only needs less than 1 clock cycle, and the execution of one WRSS instruction needs about 12 clock cycles. WRSS instructions include WRSSQ instructions (write-once 8 bytes of content) and WRSSD instructions (write-once 4 bytes of content). Wherein WRSSQ instruction writes 8 bytes of content in the source register into the destination shadow stack page and the destination address must be 8 bytes aligned, WRSSD instruction writes 4 bytes of content in the source register into the destination shadow stack page and the destination address must be 4 bytes aligned. The overhead of WRSSD instructions writing 4 bytes is comparable to the overhead of WRSSQ instructions writing 8 bytes. In contrast, WRSSQ instructions are more efficient. Thus, the WRSSQ instruction is used as an example in the following specific embodiments. It should be understood that in some cases, WRSSD instructions may be employed by those skilled in the art to make corresponding adjustments to practice the present invention.
The pointer dereferencing refers to referencing the value of the object to which the pointer points. For example, reference is made to the value of a variable stored at a certain address.
According to one embodiment of the invention, for compatibility and collision avoidance, the invention sets a special shadow stack page for protecting sensitive data and/or sensitive codes outside the shadow stack page maintained by an Intel CET mechanism (also called CET mechanism for short in some places) ("special" is used for distinguishing the shadow stack page maintained by the Intel CET mechanism itself and belongs to the shadow stack page), a WRSS instruction can operate the shadow stack page maintained by the Intel CET mechanism and the special shadow stack page, and in order to reduce writing cost, different contents to be written are distinguished and corresponding writing processing is carried out:
if the content to be written is the content to be written, which does not need to reduce the writing cost, the content to be written is directly written into a special shadow stack page through a WRSS instruction;
If the content to be written is the content to be written which needs to reduce the writing expense, and the data volume of the content to be written exceeds a preset threshold value, writing a special shadow stack page through a WRSS instruction after carrying out lossless compression processing on the content to be written;
And if the content to be written is the content to be written which needs to reduce the writing expense, and the data volume of the content to be written is smaller than a preset threshold value, when the data volume of the content to be written is accumulated to the preset threshold value, writing the special shadow stack page through a WRSS instruction.
According to one embodiment of the invention, if an implementer is newly programming a program, the implementer can write protection logic for specific sensitive data and/or sensitive code in the program at programming time according to the scheme of the invention to implement a low overhead protection mechanism by dedicated shadow stack pages.
According to one embodiment of the invention, if the implementer has written the program, but the sensitive data and/or the sensitive code are not protected by the shadow stack page in the written program, the programmer can change the protection logic of the original protection mechanism in the program into the protection logic of the invention to be realized according to the preset adjustment logic when optimizing the program by a compiler, so that time and precision are saved. For the sake of clarity, the following embodiments are mainly described in terms of protection logic through the original protection mechanism in the compiler-tuned program, but it should be understood that the following embodiments are only illustrative, and many other implementations exist in the art, not enumerated.
For protecting the integrity of the general memory, the method mainly comprises two aspects of protecting sensitive data and protecting sensitive codes, and the technical scheme of the invention is respectively described in the following two aspects:
1. protecting sensitive data
In the embodiment of protecting sensitive data, an LLVM compiler is taken as an example, which illustrates the process of adjusting the original protection mechanism in the program through the LLVM compiler and converting the original protection mechanism into the protection mechanism of the invention. LLVM (Low Level Virtual Machine) A compiler (framework) is a set of modular, reusable compilers and tool chain technologies. Most of the logic of the LLVM compiler is handling compilation optimization and code generation, and these functions consist of one or more intermediate optimization processes (i.e., pass, some documents refer to "a Pass" as "one Pass"). In order to adjust the original protection mechanism, the LLVM compiler can be used as a bottom layer framework, and by adding CETIS (namely CET-based memory Isolation Technology, which is the shorthand of the protection mechanism of the invention) Pass at the middle end of the LLVM compiler, the protection of sensitive data is realized, and an attacker can be prevented from damaging the completeness of the sensitive data. Taking CFIXX defense mechanisms and CPI mechanisms as examples, it is shown how the CETIS mechanism of the present invention combines with these defense mechanisms to achieve memory integrity protection with low overhead.
(1) Taking sensitive data in the protection CFIXX defense mechanism as an example, the case of the content to be written which does not need to reduce the writing overhead will be described
C++ is a programming language that has evolved from the C language. C++ can be used for procedural programming of C language, object-based programming featuring abstract data types, and object-oriented programming featuring inheritance and polymorphism. Wherein the dynamic allocation implemented by virtual tables is a polymorphic core in C++, which allows children to rewrite virtual functions inherited from parent classes. In C++, each polymorphism has one or more virtual tables that include function pointers for all virtual functions of the polymorphism. The virtual table is indexed by a first domain virtual table pointer of the class object, the virtual table pointer being initialized in a constructor of the class object. Dynamic allocation identifies the underlying type of the object with a virtual table pointer (Underlying Type). At each virtual function call point, the program first finds the target virtual function pointer in the virtual table through the virtual table pointer of the class to which the object belongs, and then executes the target function through indirect call. The virtual table is located in a read-only memory area (Rodata sections), and the virtual table pointer is stored in a readable and writable memory area, so that if an attacker tampers with the virtual table pointer by using program holes, a control flow hijacking attack, such as a counterfeited object oriented programming (Counterfeit Object Oriented Programming, abbreviated as COOP) attack, can be launched.
In order to resist the above attack, the Object type integrity (Object TYPE INTEGRITY, abbreviated to oi) of the c++ program needs to be guaranteed, in other words, the integrity of the virtual table pointer is guaranteed. CFIXX the defense mechanism ensures that the virtual table pointer of the object is not tampered by an attacker in the running process, and the specific method is that the LLVM compiler is modified so that the program stores the backup of the virtual table pointer in the metadata table in the running process, and the integrity of the metadata table is ensured by an address isolation method.
According to one embodiment of the present invention, an intermediate optimization process (Pass, also referred to as "one Pass") is added to the LLVM compiler based on the CFIXX defense mechanism, and generates logic for protection with a dedicated shadow stack page according to protection logic in the original CFIXX defense mechanism, where the storage location of the metadata table is set to the dedicated shadow stack page by modifying the allocation and storage portion of the metadata table in the c++ program for backing up the virtual table pointer, and the virtual table pointer is written to the metadata table at the dedicated shadow stack page by WRSSQ instructions, discarding the protection logic in the original CFIXX defense mechanism (e.g., by deleting the implementation code of the original CFIXX defense mechanism). The original CFIXX defense mechanism writes the virtual table pointer of 8 bytes into the metadata table each time, so that the content to be written (virtual table pointer) can be directly set as the content to be written without reducing the writing cost, and the WRSSQ instruction of the CET mechanism is directly used for writing into the special shadow stack page. When the program is executed, before the corresponding virtual table pointer stored in the common memory executes the indirect call to the target function, comparing the consistency of the virtual table pointer in the common memory with the virtual table pointer recorded in the metadata table in the special shadow stack page, if the consistency is the same, the indirect call is safe, the execution is continued, if the consistency is not the same, the indirect call is unsafe, and the exception is thrown and the execution is stopped. After adjustment, the protection function of the original CFIXX defense mechanism can be realized by utilizing the defense mechanism of the invention, but the cost for realizing the defense mechanism of the invention is smaller when the program is executed after adjustment.
(2) The code pointer integrity mechanism (CPI mechanism) is also a mechanism for guaranteeing the integrity of sensitive data, and the following describes how the present invention protects the Code Pointer Integrity (CPI), and also describes the case where the writing overhead needs to be reduced, and the data amount of the content to be written exceeds a predetermined threshold.
The CPI mechanism is intended to protect the integrity of the sensitive pointer, preventing an attacker from tampering with the sensitive pointer, and thus preventing launch of a control flow hijacking attack. In CPI, the definition of the sensitive pointer is recursive, i.e. includes all code pointers (e.g. function pointers, return addresses, etc.) as well as pointers that can be used to access the sensitive pointer. The sensitive data in the CPI is divided into two parts, one part being a security Stack (SAFE STACK) for storing the value of the return address and objects that can be proven to be secure by static analysis, the non-secure objects being stored in the non-secure Stack (unafe Stack), in the implementation of the CPI the program main Stack being set to the security Stack, and the other part being a security pointer storage area (Safe Pointer Store) for storing the metadata of the sensitive pointers other than the return address, see fig. 2a, including the value of the sensitive pointer, and its upper and lower bounds upper for pointing to the objects. Before the pointer is dereferenced, whether the dereferencing is safe or not is judged according to the metadata of the sensitive pointer in the safe pointer storage area (namely, whether the code pointer is tampered or not is judged, and whether an access target of the data pointer is out of range or not is judged).
According to one embodiment of the invention, the return address in the security stack protection program is replaced by using a CET mechanism to prevent tampering by modifying the LLVM compiler, and meanwhile, the page where the security pointer storage area is located is set as a special shadow stack page, namely, the metadata of the sensitive pointer is stored in the special shadow stack page. Taking the x86_64 processor as an example, the metadata (value, upper, lower) of each sensitive pointer is 24 bytes, and the direct writing of WRSSQ instructions into a dedicated shadow stack page in 3 times increases the isolated performance overhead. Whereas in the prior art 48-63 bits in the 8 byte address space are not used by the addressing process, the present invention minimizes the amount of data written to the sensitive memory at a time by means of the bits used by the non-addressing process in a lossless manner by compressing the metadata. According to one embodiment of the invention, see FIG. 2b, 24 bytes are compressed to 16 bytes and the compressed data structure (or compression_val structure) is shown in FIG. 2c. Since the address space of 2 48 bytes can be indexed in the present x86_64 processor, the lower 48 bits of the pointer in the user space are valid, and the upper 16 bits are all 0, so the lower 48 bits (0-47 bits) in the compression_val structure are used to store the Value (Value) of the sensitive pointer, and the remaining bits are mainly used to store the first difference Value Offset1 (48-54 bits) and the second difference Value Offset2 (55-61 bits). The Value of the first difference Value Offset1 is equal to the Value of the sensitive pointer minus the Lower bound Lower of the object pointed by the sensitive pointer, and the Value of the second difference Value Offset2 is equal to the upper bound upper of the object pointed by the sensitive pointer minus the Value of the sensitive pointer. In the present invention, an object of 128 bytes or more is referred to as a "large object", and an object of less than 128 bytes is referred to as a "small object". The 62 to 63 of the compression_val structure is an extended bit (i.e., an extended class indication bit) for identifying the class of the sensitive pointer corresponding to the metadata, which indicates that the sensitive pointer is a code pointer when extended=1, that the sensitive pointer is a data pointer pointing to a small object when extended=2 (both Offset1 and Offset2 may be encoded with 7 bits), that the sensitive pointer is a data pointer pointing to a large object when extended=3, and that the sensitive pointer has been released when extended=0.
According to an embodiment of the present invention, pointers of different categories may have different metadata compression policies, and referring to fig. 3 a-3 d, an exemplary utilization scheme of an address space under the compression policy of metadata of pointers of 4 different categories is given, and gray areas are actually used areas. It can be seen from fig. 3a that the code pointer only needs to store the value of the pointer (only 8 bytes), only needs to use part of bits in the address of the lower 8 bytes, and only needs to use WRSSQ instructions to execute a write operation after compression. In fig. 3b, the data pointer pointing to the small object stores the Value of the pointer and two offsets, and the maximum Value of the first difference Offset1 or the second difference Offset2 also occupies only 7 bits due to the data pointer pointing to the small object, and the 48-61 bits in the lower 8 bytes just can store two offsets due to the extended bit occupying only 2 bits. Therefore, the data pointer to the small object only needs to use 8 bytes, and only needs to use WRSSQ instructions to execute a write operation after compression. In fig. 3c, the data pointer pointing to the large object cannot store two offsets because 48 to 61 bits, so an extra 8 bytes are needed for writing the offsets, and therefore, the upper 8 bytes of 16 bytes are used for storing two offsets (for example, 64-95 bits store Offset1 and 96-127 bits store Offset 2), the original 24 bytes can be compressed to 16 bytes, and then the two write operations are performed using WRSSQ instructions after the compression. When the pointer is released, only the extended position 0 in the lower 8 bytes is needed. Therefore, under the strategy of compressing metadata, only 16 bytes need to be written into a data pointer pointing to a large object, and only 8 bytes need to be written into the data pointer under the other conditions, so that the performance cost of updating the security area is reduced to the greatest extent. When the pointer is used for unreferencing, the metadata of the pointer is decompressed according to the extennd value and the corresponding compression strategy, and whether the pointer is legal or not is checked by utilizing the original checking logic of CPI. Before the corresponding pointer stored in the common memory is dereferenced, whether the dereferencing is safe or not is judged according to the metadata of the sensitive pointer stored in the special shadow stack page for the pointer backup, if not, the exception is thrown and the execution is stopped, and if so, the pointer is dereferenced.
2. Protecting sensitive code
In addition to protecting sensitive data, the present invention may also protect the integrity of sensitive code. The following describes the case where the content to be written needs to be reduced in writing overhead and the amount of data of the content to be written is smaller than a predetermined threshold value by means of an embodiment of protecting the integrity of the sensitive code.
The performance of the JavaScript engine is crucial to the overall browser impact, and JIT (Just-In-Time) compilation optimization is to improve the performance of the JavaScript engine. As shown in fig. 4, the parser in the JS engine will first parse the input JS file into bytecodes and interpret the execution by the interpreter. When the same piece of script code is repeatedly executed in a loop sentence, if the interpreter repeatedly executes the relevant byte code, the efficiency may be low. The JIT compiler may directly generate the source code into machine instructions that are directly executed on the next execution. Only when the target function or the loop sentence is frequently called, the JIT compiling is started, the corresponding machine instruction is generated after the JIT compiling, the corresponding machine instruction is stored in a memory space as a Native Code, the memory space storing the Native Code is called a Code Cache, and the machine Code (machine instruction) is directly executed when the target function or the loop sentence is called next time. Once JIT generation is complete, the program may directly call the JIT generated machine code.
Bytecodes are interpreted in a limited virtual machine environment, while machine code in the code cache is executed directly by the local processor, so the JIT compiler limits the ability to transmit code caches, e.g., the JIT compiler does not transmit potentially dangerous instructions such as system call instructions, etc. Since the JIT compiler needs to write the generated machine code into memory, the most straightforward approach is to set the page where the machine code is located to be a readable and writable executable page, such as the JavaScriptCore engine implementation under an Intel processor. However, the W-X strategy is broken through, so that the code cache is easy to be an object of an attacker, and therefore, some engines adopt a domain isolation method based on the mpprotect () system call to protect the code cache from being tampered by the attacker, namely, the code cache is firstly set to be readable and writable during transmitting, and then the authority of the code cache is set to be readable and executable after the transmitting is finished, such as the implementation of JavaScriptCore engines under an ARM processor, chakra and the like. In order to reduce the performance overhead of frequently-transmitted mpprotect () system, the JavaScript engine will store the machine code generated by the JIT compiler into the Buffer (Buffer), and copy the machine code in the Buffer into the code Buffer area at one time by using the memory copy function memcpy () after the generation is completed.
For some sensitive codes, the invention can set a special shadow stack page for protection, and the sensitive codes are not stored in a common memory, and only the page for storing the sensitive codes is set as an executable special shadow stack page, thereby achieving the purpose of protection. According to one embodiment of the invention, the code cache region where the sensitive code is located is protected by utilizing Intel CET technology, and the NX position 0 in the page table item corresponding to the special shadow stack page where the sensitive code is stored, so that the page where the sensitive code is located is the shadow stack page with executable rights. The shadow stack page in Intel CET technology is a read-only dirty page, and the attribute of the page table item is shown in figure 5, dirty bit D is 1, and read/write bit R/W is 0. Since the NX bit (used to indicate whether the page is a non-executable page, 0 is a executable page, and1 is a non-executable page) in the page table entry is separated from the read-write bit and the dirty bit, the NX position 0 in the page table entry corresponding to the page where the code buffer is located can be set so that the page where the code buffer is located is a shadow stack page with executable authority.
In accordance with one embodiment of the invention, the invention may be deployed on a Chakra engine, taking Chakra engine as an example. Chakra engine is a JavaScript engine developed by Microsoft for Microsoft Edge browser. The Chakra engine compiles scripts just-in-time on a separate CPU core, in parallel with the browser. Before and after generating the code buffer, the Chakra engine calls the write permission of the switch code buffer through the mpprotect () system, so that the write permission and the executable permission of the switch code buffer cannot be simultaneously provided. Since the Buffer (Buffer) buffers the machine code generated by the JIT compiler, and the page where the Buffer is located is readable and writable, an attacker can also achieve the effect of indirectly tampering with the code Buffer area by tampering with the Buffer in the Chakra engine. In order to protect against attacks against buffers, the Chakra engine protects the buffers, as shown in fig. 4, the JIT compiler of the Chakra engine computes check codes (Checksum) byte by byte for the machine code as each IR is compiled into machine code and stored in the buffers, after executing the memcpy () operation, recalculates the check codes byte by byte for the machine code in the code cache, compares the calculated check codes with the previous check codes, and uses the machine code in the code cache after passing, otherwise, reports errors.
Referring to fig. 6, the present invention modifies the JIT compiler of the Chakra engine to set the page of the code cache holding sensitive code to be an executable private shadow stack page in accordance with one embodiment of the present invention. Meanwhile, the original buffer and check flow of the Chakra engine are abandoned, and the compiled machine code is directly written into the code cache area by utilizing WRSSQ instruction. However, the code fragments generated by a JIT compiler each time are of a different and typically shorter length, and since WRSSQ instructions must write 8 bytes to an 8-byte aligned destination address each time, writing successively generated small-byte data to the code cache requires execution of WRSSQ instructions multiple times. For example, the JIT compiler continuously generates 4 bytes/2 bytes of code, and sequentially writes the code into the code buffer as follows (assuming that the destination address of the 4 bytes of code is 0x1000, which is 8 bytes aligned):
① 4 bytes are read from the memory with the address of 0x1004, spliced with 4-byte codes to be written, and then the 8-byte content is written into a code buffer with the address of 0x1000 by utilizing WRSSQ instructions;
② Reading 4 bytes and 2 bytes from 0x1000 and 0x1006 respectively, splicing with the 2 bytes to be written, and then writing the 8 bytes into a code buffer with the address of 0x1000 by utilizing WRSSQ instructions;
③ 6 bytes are read from 0x1000 and concatenated with the 2 bytes to be written, and then the 8 bytes of content are written into a code buffer addressed to 0x1000 using WRSSQ instructions.
The write operation above requires 3 times WRSSQ instructions to be executed, and in order to further improve performance, the present invention proposes a Register-as-buffer (Register-as-buffer) technique. According to one example of the present invention, as shown in FIG. 7, the present invention uses a register as a buffer to save a shorter code fragment into an XMM register (which may be set to be reserved by CETIS for storing only code fragments), and writes the code fragment into a code cache using WRSSQ instructions when the register content reaches 8 bytes, wherein the code that has not yet been committed into the code cache is marked using a first index mark index1 and the location where the code fragment can currently be written is marked using a second index mark index2, thereby achieving orderly commit of the code fragment. In this example, with this register, i.e., buffer, the number of times WRSSQ instructions are executed can be changed from 3 to 1, which greatly reduces the number of times WRSSQ instructions are executed and improves the performance of CETIS. In order to ensure consistency of the XMM register and the memory, a refresh operation is required to be performed before the contents of the code buffer are read, and the contents of the XMM register are synchronized to the memory. Furthermore, instead of using XMM registers as buffer registers, other general purpose registers may be used as buffer registers, such as%r14,%r15, etc. The technical scheme of the embodiment at least has the following beneficial technical effects that as the invention does not need to calculate check codes, check the check codes and perform memcpy operation, the efficiency of protecting the integrity of the code cache area is improved.
According to one embodiment of the invention, a method for protecting program security based on an Intel CET mechanism is provided, which comprises the steps of acquiring program source codes, compiling the program source codes by utilizing a compiler to protect the integrity of sensitive data and/or sensitive codes according to a method for protecting general memory integrity based on the Intel CET mechanism when the program is executed. Preferably, the step of compiling the program source code by using the compiler includes inserting corresponding protection logic codes according to sensitive data and/or information of the sensitive code to be protected in the program source code, so that the compiled program can protect the integrity of the sensitive data and/or the sensitive code through the corresponding protection logic codes according to a method for protecting the integrity of the universal memory based on an Intel CET mechanism when being executed. Preferably, the information of the sensitive data and/or the sensitive code to be protected can be a pointer or pointer range of the designated sensitive data and/or sensitive code, and the compiler inserts the corresponding protection logic code into the program according to the pointer or pointer range of the sensitive data and/or the sensitive code. Or the program source code can be provided with a protection logic code corresponding to the original protection mechanism, and in this case, the logic code corresponding to the original protection mechanism records the sensitive data and/or the information of the sensitive code to be protected. Preferably, the method for protecting the security of the program based on the Intel CET mechanism comprises the steps of inserting corresponding protection logic codes according to logic codes corresponding to original protection mechanisms (such as CPI, CFIXX and the like) for protecting sensitive data and/or sensitive codes in program source codes, and deleting the logic codes corresponding to the original protection mechanisms, so that when the compiled program is executed, the compiled program protects the integrity of the sensitive data and/or the sensitive codes through the corresponding protection logic codes according to the method for protecting the integrity of a general memory based on the Intel CET mechanism.
It should be noted that, although the steps are described above in a specific order, it is not intended that the steps must be performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the desired functionality is achieved.
The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or intra-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.