CN102741826A

CN102741826A - Performing mode switching in an unbounded transactional memory (UTM) system

Info

Publication number: CN102741826A
Application number: CN2010800639316A
Authority: CN
Inventors: A-R.阿德尔塔巴塔拜; B.萨哈; V.巴辛; G.希菲尔; J.格雷; V.格罗弗; M.泰勒菲尔; Y.列瓦诺尼; D.德特勒夫斯; M.马格鲁德; M.托尔顿
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2009-12-15
Filing date: 2010-11-10
Publication date: 2012-10-17
Anticipated expiration: 2030-11-10
Also published as: EP2513803A4; JP5635620B2; WO2011081719A2; EP2513803A2; KR20120104364A; KR101388865B1; JP5964382B2; JP2015053062A; WO2011081719A3; JP2013513888A; AU2010337319B2; AU2010337319A1; US20110145637A1; US20120079215A1; US8095824B2; CN102741826B; US8365016B2

Abstract

In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware-assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performance of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performance mode can be selected. Other embodiments are described and claimed.

Description

Executing Schema Switching in Unconstrained Transactional Memory (UTM) Systems

背景技术 Background technique

在现代计算系统中，可存在多个处理器，并且每个此类处理器可执行公共应用程序的代码的不同线程。为了保持一致性，可使用数据同步机制。一种此类技术包含使用事务存储器(TM)。事务执行经常包含执行多个微操作、操作或指令的编组。多个线程中的每个线程都可执行并访问存储器结构内的公共数据。如果两个线程都访问/改变该结构内的相同条目，则可执行冲突解决以确保数据有效性。一种类型的事务执行包含软件事务存储器(STM)，其中，一般而言在没有硬件支持的情况下，用软件执行追踪存储器访问、冲突解决、中止任务以及其它事务任务。 In modern computing systems, there may be multiple processors, and each such processor may execute a different thread of code for a common application. In order to maintain consistency, a data synchronization mechanism can be used. One such technique involves the use of transactional memory (TM). Transactional execution often involves the execution of multiple micro-operations, operations, or groupings of instructions. Each of the plurality of threads can execute and access common data within the memory structure. If two threads both access/alter the same entry within the structure, conflict resolution can be performed to ensure data validity. One type of transactional execution includes software transactional memory (STM), where tracking memory accesses, conflict resolution, aborting tasks, and other transactional tasks are performed in software, generally without hardware support.

另一种类型的事务执行包含硬件事务存储器(HTM)系统，其中，包含硬件以支持访问追踪、冲突解决和其它事务任务。预先用附加位扩展实际存储器数据阵列以保存信息、诸如硬件属性以便追踪读、写和缓冲，并且因此，数据与数据一起传播，从处理器到存储器。经常，这个信息称为永久性的，即，它在高速缓存收回时不损失，这是因为该信息与数据一起传播通过整个存储器层级。然而，这种永久性迫使整个存储器层级系统有更多开销。 Another type of transactional execution includes hardware transactional memory (HTM) systems, where hardware is included to support access tracking, conflict resolution, and other transactional tasks. The actual memory data array is pre-extended with additional bits to hold information, such as hardware attributes to track reads, writes and buffering, and thus travel with the data, from the processor to the memory. Often, this information is said to be persistent, ie, it is not lost on cache evictions because the information propagates with the data through the entire memory hierarchy. However, this persistence imposes more overhead on the overall memory hierarchy.

又一种类型的TM模型被称为无约束事务存储器(UTM)，无约束事务存储器(UTM)使时间和存储器占用大小(memory footprint)方面任意大的事务能够使用硬件和软件通过硬件加速组合发生。运行和实现UTM事务通常需要特别编译的代码以便实现与UTM硬件加速接口的并发控制机制。因此，UTM事务能够是复杂的，并且可能不与现有硬件和STM事务系统正确相接口(interface)。 Yet another type of TM model is known as Unconstrained Transactional Memory (UTM), which enables arbitrarily large transactions in terms of time and memory footprint to occur using a combination of hardware and software through hardware acceleration . Running and implementing UTM transactions typically requires specially compiled code to implement concurrency control mechanisms that interface with UTM hardware acceleration. Thus, UTM transactions can be complex and may not properly interface with existing hardware and STM transaction systems.

附图说明 Description of drawings

图1是根据本发明一个实施例的处理器的框图。 FIG. 1 is a block diagram of a processor according to one embodiment of the invention.

图2是根据本发明一个实施例在处理器中保持数据项的元数据的框图。 Figure 2 is a block diagram of maintaining metadata of a data item in a processor according to one embodiment of the present invention.

图3是根据本发明一实施例用于选择用于执行TM事务的事务模式的方法的流程图。 FIG. 3 is a flowchart of a method for selecting a transaction mode for executing a TM transaction according to an embodiment of the present invention.

图4是作为在具体模式中执行的事务失败的结果而处理模式切换的方法的流程图。 4 is a flowchart of a method of handling a mode switch as a result of a transaction executing in a particular mode failing.

图5是根据本发明一实施例用于并发地处理硬件事务和软件事务的方法的流程图。 5 is a flowchart of a method for concurrently processing hardware transactions and software transactions according to an embodiment of the present invention.

图6是根据本发明一实施例的系统的框图。 Figure 6 is a block diagram of a system according to an embodiment of the invention.

具体实施方式 Detailed ways

在各种实施例中，TM实现能够在不同模式中运行不同线程事务，并且能够出于各种原因切换模式，包括软件冲突管理或使用不支持的语义或操作(诸如嵌套事务、重试、调试或外部事务)。根据本发明一实施例的UTM系统担负得起具有不同性能、灵活性(语义丰富性)和容量考虑的执行模式的大设计空间。这些模式一般而言是事务的，代码生成，处理器和公共语言运行时间(CLR)模式的组合。虽然这构成了大空间，但介绍的是与本论述最相关的具体模式。 In various embodiments, TM implementations are able to run different threaded transactions in different modes, and can switch modes for various reasons, including software conflict management or use of unsupported semantics or operations (such as nested transactions, retries, debugging or external transactions). A UTM system according to an embodiment of the present invention affords a large design space of execution modes with different performance, flexibility (semantic richness) and capacity considerations. These patterns are generally a combination of transactional, code generation, processor, and common language runtime (CLR) patterns. While this constitutes a large space, it is the specific patterns that are most relevant to this discussion that are presented.

可在各种事务模式中执行事务存储器代码。不同事务模式可能需要不同代码生成策略或至少受益于不同代码生成策略。事务执行模式包含如下。非事务(NT)，其是没有隔离或失败原子性的经典执行模式，并由此不要求事务登录(logging)或锁定(locking)。高速缓存驻留非锁定(CRNL)模式，也称为高速缓存驻留隐式事务模式(CRITM)，其中整个事务读/写集合被保存在高速缓冲存储器中，并且用硬件检测事务冲突。在此模式中，不需要登录或其它手段，并且不获取软件兼容锁定。在一个实施例中，CRNL由此仅支持其数据集合完全适合处理器高速缓存的比较小事务。另一模式是高速缓存驻留(CR)模式(也称为高速缓存驻留显式事务模式(CRETM))，其中整个事务读/写集合被存储在高速缓存中，并且可用硬件检测事务冲突。在此模式中，不需要登录或其它手段，但获取软件兼容锁定。在各种实施例中，CR(像上面的CRNL模式)仅支持其数据集合完全适合处理器高速缓存的比较小事务。 Transactional memory code can execute in various transactional modes. Different transaction modes may require or at least benefit from different code generation strategies. Transaction execution modes include the following. Non-transactional (NT), which is a classical execution mode without isolation or failure atomicity, and thus does not require transactional logging or locking. Cache-Resident Non-Locking (CRNL) mode, also known as Cache-Resident Implicit Transaction Mode (CRITM), where the entire transaction read/write set is kept in cache memory and transaction conflicts are detected in hardware. In this mode, no login or other means are required, and no software compatibility lock is acquired. In one embodiment, CRNL thus only supports relatively small transactions whose data sets fit neatly in the processor's cache. Another mode is cache resident (CR) mode (also known as cache resident explicit transaction mode (CRETM)), where the entire transaction read/write set is stored in cache and transaction conflicts can be detected by hardware. In this mode, no login or other means are required, but a software compatibility lock is acquired. In various embodiments, CR (like the CRNL mode above) only supports relatively small transactions whose data sets fit perfectly in the processor cache.

又一种模式是具有硬件辅助监视和过滤(HAMF)的软件模式，其是将UTM监视设施用来检测事务冲突并且用于过滤的软件模式。在此模式中，获取软件兼容锁定。另一种模式是具有硬件辅助过滤(HAF)的软件模式，其中UTM设施仅用于过滤。在此模式中执行软件登录，并获取软件兼容锁定。一般而言，最后这两种模式可称为硬件辅助STM(HASTM)模式。最终，软件事务存储器(STM)模式是不使用UTM资源的纯软件模式。 Yet another mode is a software mode with Hardware Assisted Monitoring and Filtering (HAMF), which is a software mode that uses UTM monitoring facilities to detect transaction conflicts and for filtering. In this mode, a software compatibility lock is acquired. Another mode is the software mode with hardware assisted filtering (HAF), where UTM facilities are used for filtering only. Perform a software login in this mode and acquire a software compatibility lock. In general, these last two modes may be referred to as Hardware Assisted STM (HASTM) modes. Finally, Software Transactional Memory (STM) mode is a software-only mode that does not use UTM resources.

为了支持不同事务模式，可将源代码的具体块(chunk)转换成不同的二进制代码序列。裸的(NK)是指没有利用具体事务手段的经典代码。事务VTable(TV)是对于各个对象字段访问嵌入间接函数调用以实现恰当事务登录和锁定的代码生成模式。分派表(vtable)用于分派不同函数以便使该生成的代码能够用于支持各种事务模式。 In order to support different transaction modes, specific chunks of source code can be converted into different binary code sequences. Naked (NK) refers to classic code that does not utilize specific transactional means. Transactional VTable (TV) is a code generation pattern that embeds indirect function calls for individual object field accesses to enable proper transactional logging and locking. A dispatch table (vtable) is used to dispatch different functions so that the generated code can be used to support various transaction modes.

处理器又可相对于与事务相关的监视和缓冲的UTM性质执行三种基本模式其中之一。可选择第一模式MB_ALL，其中所有载入和存储都诱导硬件监视和缓冲。这一般而言是使用UTM设施的最简单方式，但可导致监视和缓冲被应用于不需要它的存储器范围(像只读状态或堆栈)。可选择第二模式MB_DATA，其中默认缓冲/监视对于其而言硬件事务相对于分段寄存器进行存储器访问的所有载入和存储。在此模式中，所有堆栈访问都具有潜在未监视移动(potentially unmonitored move，PUMOV)语义，即，如果“载入”读缓冲的高速缓存行，则它读缓冲的内容；如果“存储”写到非缓冲高速缓存行，则它表现得像正常写那样；如果它写到缓冲高速缓存行，则缓冲拷贝和主要拷贝都被更新。此模式提供了硬件通过其来缓冲和监视的细粒度控制，并且一般而言，以更复杂的代码生成判定为代价，允许事务相比MB_ALL模式保存更有用的数据。可选择第三模式MB_NONE，其中没有发生对载入和存储的自动缓冲和监视。反而是UTM ISA提供了专门指令来诱导对具体存储器位置的缓冲或监视。注意，执行模式仅控制用于在处理器高速缓存内设置UTM状态的指令。一旦在高速缓存中设置了状态，就不可能确定哪个模式用于设置该状态。 The processor, in turn, can execute one of three basic modes with respect to the UTM nature of transaction-related monitoring and buffering. A first mode MB_ALL may be selected, where all loads and stores induce hardware monitoring and buffering. This is generally the easiest way to use UTM facilities, but can result in monitoring and buffering being applied to memory ranges that don't need it (like read-only state or the stack). A second mode, MB_DATA, may be selected where by default all loads and stores for which hardware transactions make memory accesses to segment registers are buffered/monitored. In this mode, all stack accesses have potentially unmonitored move (PUMOV) semantics, i.e., if a "load" reads a buffered cache line, it reads the buffered contents; if a "store" writes to If it writes to a non-buffered cache line, it behaves like a normal write; if it writes to a buffered cache line, both the buffered copy and the primary copy are updated. This mode provides fine-grained control over which the hardware buffers and monitors, and generally allows transactions to hold more useful data than MB_ALL mode, at the cost of more complex code generation decisions. A third mode MB_NONE may be selected, where no automatic buffering and monitoring of loads and stores takes place. Instead, the UTM ISA provides special instructions to induce buffering or monitoring of specific memory locations. Note that the execution mode only controls the instructions used to set the UTM state within the processor caches. Once a state is set in the cache, it is impossible to determine which mode was used to set that state.

可以在不同模式中调用公共语言运行时间(CLR)中的本机代码，所述不同模式包含：非事务，其是其中调用CLR的本机代码的经典方式；隐式事务模式，发生在调用CLR代码时，同时当前线程正在执行硬件事务，并且处理器配置用于MB_DATA；以及显式事务模式，发生在调用CLR代码时，同时当前线程正在执行硬件事务，并且处理器配置用于MB_NONE，或者发生在当前线程正在执行软件事务时。调用CLR的不同方式确定本机代码需要做什么以便访问管理的环境的当前状态。在非事务模式和隐式模式中，CLR可直接无阻碍地读管理的状态。在显式事务模式中，CLR可采用帮助函数来访问管理的状态。 Native code in the common language runtime (CLR) can be called in different modes including: non-transactional, which is the classic way in which native code from the CLR is called; implicit transactional mode, which occurs when calling the CLR code, while the current thread is executing a hardware transaction, and the processor is configured for MB_DATA; and explicit transaction mode, which occurs when calling CLR code while the current thread is executing a hardware transaction, and the processor is configured for MB_NONE, or While the current thread is executing a software transaction. Different ways of invoking the CLR determine what the native code needs to do in order to access the current state of the managed environment. In non-transactional and implicit modes, the CLR can directly read managed state without hindrance. In explicit transaction mode, the CLR can use helper functions to access managed state.

作为可用在无约束TM(UTM)系统中的实现背景，查看可用于UTM事务的示例硬件是有益的。一般而言，UTM事务允许结合可完全用硬件实现的事务(即高速缓存驻留事务)，以及使用硬件和软件的组合执行的无约束事务来使用硬件。参考图1，例证了能够并发地执行多个线程的处理器实施例。注意，处理器100可包含对于硬件事务执行的硬件支持。根据本发明一实施例，结合硬件事务执行或者单独地，处理器100也可提供对于如下的硬件支持：STM的硬件加速、STM的单独执行或其组合，例如UTM。处理器100可以是任何类型处理器，诸如微处理器、嵌入式处理器、数字信号处理器(DSP)、网络处理器或执行代码的其它装置。所例证的处理器100包含多个处理单元。 As an implementation background that can be used in an Unconstrained TM (UTM) system, it is instructive to look at example hardware that can be used for UTM transactions. In general, UTM transactions allow the use of hardware in conjunction with transactions that can be implemented entirely in hardware (ie, cache-resident transactions), as well as unconstrained transactions that are performed using a combination of hardware and software. Referring to FIG. 1 , an embodiment of a processor capable of executing multiple threads concurrently is illustrated. Note that processor 100 may include hardware support for hardware transactional execution. According to an embodiment of the present invention, combined with hardware transaction execution or independently, the processor 100 may also provide hardware support for the following: hardware acceleration of STM, separate execution of STM or a combination thereof, such as UTM. Processor 100 may be any type of processor, such as a microprocessor, embedded processor, digital signal processor (DSP), network processor, or other device that executes code. The illustrated processor 100 includes a plurality of processing units.

如图1中所例证的物理处理器100包含两个核：核101和102，它们共享对更高级高速缓存110的访问。尽管处理器100可包含不对称核，即具有不同配置、功能单元和/或逻辑的核，但例证的是对称核。因此，将不详细论述例证为与核101相同的核102，以避免重复论述。此外，核101包含两个硬件线程101a和101b，而核102包含两个硬件线程102a和102b。因此，软件实体、诸如操作系统有可能将处理器100视为四个单独的处理器，即能够并发地执行四个软件线程的四个逻辑处理器或处理单元。 The physical processor 100 as illustrated in FIG. 1 includes two cores: cores 101 and 102 , which share access to a higher level cache 110 . Symmetric cores are illustrated, although processor 100 may include asymmetric cores, ie, cores having different configurations, functional units, and/or logic. Therefore, the core 102 exemplified as being the same as the core 101 will not be discussed in detail to avoid redundant discussion. Furthermore, core 101 contains two hardware threads 101a and 101b, while core 102 contains two hardware threads 102a and 102b. Thus, it is possible for a software entity, such as an operating system, to view processor 100 as four separate processors, ie four logical processors or processing units capable of executing four software threads concurrently.

在此，第一线程与架构状态寄存器101a相关联，第二线程与架构状态寄存器101b相关联，第三线程与架构状态寄存器102a相关联，并且第四线程与架构状态寄存器102b相关联。如所例证的，在架构状态寄存器101b中复制架构状态寄存器101a，因此能够存储各个架构状态/上下文用于逻辑处理器101a和逻辑处理器101b。在一个实施例中，架构状态寄存器可包含用于实现UTM事务的寄存器，例如事务状态寄存器(TSR)、事务控制寄存器(TCR)和弹出指令指针寄存器以识别可用于相应地在事务期间处理事件(诸如事务的中止)的弹出处理机的位置。 Here, a first thread is associated with architectural state register 101a, a second thread is associated with architectural state register 101b, a third thread is associated with architectural state register 102a, and a fourth thread is associated with architectural state register 102b. As illustrated, the architectural state register 101a is replicated in the architectural state register 101b, so individual architectural states/contexts can be stored for both logical processor 101a and logical processor 101b. In one embodiment, the architectural status registers may include registers for implementing UTM transactions, such as the Transaction Status Register (TSR), Transaction Control Register (TCR), and Pop Instruction Pointer Register to identify the registers that can be used to process events accordingly during the transaction ( The location of pop handlers such as the abort of a transaction.

对于线程101a和101b，其它更小资源(诸如指令指针和重命名分配器逻辑130中的重命名逻辑)也可以是复制的。可通过分区共享一些资源，诸如重排序/退出单元135中的重排序缓冲器、指令转换后备缓冲器(ITLB)120、载入/存储缓冲器和队列。有可能完全共享其它资源，诸如通用内部寄存器、页表基础寄存器、低级数据高速缓存和数据TLB 115、一个或多个执行单元140和部分无序单元(out-of-order unit)135。 For threads 101a and 101b, other smaller resources such as instruction pointers and rename logic in rename allocator logic 130 may also be replicated. Some resources, such as reorder buffers in reorder/retirement unit 135, instruction translation lookaside buffer (ITLB) 120, load/store buffers, and queues, may be shared by partitioning. It is possible to fully share other resources, such as general internal registers, page table base registers, low-level data cache and data TLB 115, one or more execution units 140, and part of the out-of-order unit (out-of-order unit) 135.

如所例证的，处理器100包含总线接口模块105以与处理器100外部的装置(诸如系统存储器175、芯片集、北桥或其它集成电路)通信。存储器175可专用于处理器100，或与系统中的其它装置共享。更高级或进一步离开的(further-out)高速缓存110要高速缓存近来取自更高级高速缓存110的元素。注意，更高级或进一步离开的是指增大或变得离一个或多个执行单元距离更远的高速缓存级。在一个实施例中，更高级高速缓存110是第二级数据高速缓存。然而，更高级高速缓存110不限于此，这是因为它可与指令高速缓存相关联或包含指令高速缓存。跟踪高速缓存(trace cache)，即，一种类型的指令高速缓存，反而可耦合在解码器125之后以存储近来解码的跟踪(trace)。模块120也有可能包含用于预测要执行/取得的分支的分支目标缓冲器和用于存储用于指令的地址转换条目的ITLB。 As illustrated, processor 100 includes a bus interface module 105 to communicate with devices external to processor 100 , such as system memory 175 , a chipset, a north bridge, or other integrated circuits. Memory 175 may be dedicated to processor 100, or shared with other devices in the system. A higher-level or further-out cache 110 is to cache elements recently fetched from the higher-level cache 110 . Note that higher or further away refers to cache levels that increase or become farther away from one or more execution units. In one embodiment, higher level cache 110 is a second level data cache. However, higher level cache 110 is not so limited, as it may be associated with or contain an instruction cache. A trace cache, ie, a type of instruction cache, may instead be coupled after decoder 125 to store recently decoded traces. Module 120 may also contain a branch target buffer for predicting branches to be taken/fetched and an ITLB for storing address translation entries for instructions.

解码模块125耦合到取单元120以解码所取的元素。在一个实施例中，处理器100与ISA相关联，ISA定义/规定可在处理器100上执行的指令。这里，由ISA识别的机器代码指令经常包含称为操作码的部分指令，操作码引用/规定要执行的指令或操作。 A decode module 125 is coupled to the fetch unit 120 to decode fetched elements. In one embodiment, processor 100 is associated with an ISA that defines/specifies instructions executable on processor 100 . Here, machine code instructions recognized by the ISA often contain parts of instructions called opcodes, which reference/specify instructions or operations to be performed.

在一个示例中，分配器和重命名块130包含分配器以预留资源(诸如寄存器文件)以便存储指令处理结果。然而，线程101a和101b有可能能够无序执行，其中分配器和重命名器块130也预留其它资源(诸如重排序缓冲器)以便追踪指令结果。单元130也可包含寄存器重命名器以将程序/指令引用寄存器重命名成处理器100内部的其它寄存器。重排序/退出单元135包含部件(诸如上面提到的重排序缓冲器、载入缓冲器和存储缓冲器以支持无序执行以及无序执行指令的后来的有序退出。 In one example, the allocator and rename block 130 includes an allocator to reserve resources, such as register files, for storing instruction processing results. However, threads 101a and 101b may be capable of out-of-order execution, where allocator and renamer block 130 also reserves other resources, such as reorder buffers, in order to track instruction results. Unit 130 may also include a register renamer to rename program/instruction reference registers to other registers internal to processor 100 . Reorder/retirement unit 135 contains components such as the above-mentioned reorder buffer, load buffer, and store buffer to support out-of-order execution and subsequent in-order retirement of out-of-order executed instructions.

在一个实施例中，调度器和执行单元块140包含调度器单元以调度执行单元上的指令/操作。例如，在具有可用浮点执行单元的执行单元端口上调度浮点指令。也包含与执行单元相关联的寄存器文件以存储信息指令处理结果。示范执行单元包含浮点执行单元、整数执行单元、跳执行单元、载入执行单元、存储执行单元和其它已知执行单元。 In one embodiment, scheduler and execution units block 140 includes a scheduler unit to schedule instructions/operations on execution units. For example, floating-point instructions are dispatched on execution unit ports that have floating-point execution units available. Also included are register files associated with the execution units to store information instruction processing results. Exemplary execution units include floating point execution units, integer execution units, jump execution units, load execution units, store execution units, and other known execution units.

较低级数据高速缓存和数据转换缓冲器(D-TLB)150耦合到一个或多个执行单元140。数据高速缓存要存储近来使用/操作的元素，诸如数据操作数，它们有可能保持在存储器一致状态。D-TLB要存储近来虚拟/线性到物理地址转换。作为特定示例，处理器可包含页表结构以将物理存储器分成多个虚拟页。 A lower-level data cache and data translation buffer (D-TLB) 150 is coupled to one or more execution units 140 . A data cache is intended to store recently used/operated elements, such as data operands, which are likely to be kept in a memory coherent state. The D-TLB is to store recent virtual/linear to physical address translations. As a specific example, a processor may contain a page table structure to divide physical memory into multiple virtual pages.

在一个实施例中，处理器100能够进行硬件事务执行、软件事务执行或其组合或混合。也可称为代码的临界或原子分段的事务包含要作为原子组执行的指令、操作或微操作的编组。例如，指令或操作可用于给事务或临界分段划界线。在一个实施例中，这些指令是部分指令集，诸如ISA，它们可由处理器100的硬件(诸如上面描述的解码器)识别。经常，一旦从高级语言编译成硬件可识别汇编语言，这些指令就包含操作代码(操作码)或解码器在解码阶段期间识别的其它部分的指令。 In one embodiment, processor 100 is capable of hardware transactional execution, software transactional execution, or a combination or hybrid thereof. A transaction, which may also be referred to as a critical or atomic segment of code, contains a grouping of instructions, operations, or micro-operations to be executed as an atomic group. For example, instructions or operations may be used to demarcate transactions or critical segments. In one embodiment, these instructions are part of an instruction set, such as an ISA, that are recognizable by hardware of processor 100, such as the decoder described above. Often, these instructions contain operational codes (opcodes) or other portions of instructions that are recognized by the decoder during the decode stage, once compiled from the high-level language to hardware-recognizable assembly language.

通常，在事务执行期间，直到提交了事务，才使存储器的更新在全局可见。作为示例，对位置(location)的事务写有可能对本地线程而言是可见的，然而，响应于从另一线程的读，直到提交了包含事务写的事务，才转发写数据。虽然事务仍未决，但追踪从存储器载入和写到存储器内的数据项/元素，如下面更详细论述的。一旦事务达到提交点，如果对于该事务未检测到冲突，则提交该事务，并且使该事务期间进行的更新在全局可见。 Typically, during transaction execution, updates to memory are not made globally visible until the transaction is committed. As an example, a transactional write to a location may be visible to a local thread, however, in response to a read from another thread, the write data is not forwarded until the transaction containing the transactional write is committed. While the transaction is still pending, data items/elements loaded from and written to memory are tracked, as discussed in more detail below. Once a transaction reaches a commit point, if no conflicts are detected for the transaction, the transaction is committed and updates made during the transaction are made globally visible.

然而，如果在其未决期间该事务被无效，则事务被中止并且可能重新开始，而没有使更新全局可见。因此，本文所用的事务未决是指已经开始执行但尚未提交或中止(即未决)的事务。 However, if the transaction is invalidated while it is pending, the transaction is aborted and possibly restarted without making the updates globally visible. Therefore, as used herein, pending transactions refer to transactions that have started to execute but have not yet been committed or aborted (ie, pending).

在一个实施例中，处理器100能够利用硬件/逻辑(即在硬件事务存储器(HTM)系统内)执行事务。当实现HTM时，从架构角度和微架构角度，均存在许多特定实现细节；其中的大部分在本文未做讨论，以避免不必要地模糊本发明的实施例。然而，为了例证目的，公开了一些结构和实现。然而，应该注意，这些结构和实现不是必需的，而是可用具有不同实现细节的其它结构补充和/或用具有不同实现细节的其它结构代替。 In one embodiment, processor 100 is capable of executing transactions using hardware/logic (ie, within a hardware transactional memory (HTM) system). When implementing HTM, there are many specific implementation details, both from an architectural and microarchitectural perspective; most of which are not discussed herein to avoid unnecessarily obscuring embodiments of the invention. However, some structures and implementations are disclosed for illustrative purposes. It should be noted, however, that these structures and implementations are not required, but may be supplemented and/or replaced by other structures with different implementation details.

一般而言，处理器100可能够执行UTM系统内的事务，该UTM系统尝试利用STM系统和HTM系统两者的优点。例如，对于执行小事务，HTM经常快速且有效，这是因为它不依赖于软件来执行所有的访问追踪、冲突检测、验证和事务提交。然而，HTM通常仅能够处理较小事务，而STM能够处理大小无约束的事务。因此，在一个实施例中，UTM系统利用硬件来执行较小事务并且利用软件来执行对于硬件而言太大的事务。从下面的论述中可以看到，甚至当软件正在处理事务时，硬件也可用于辅助并加速软件。也可利用相同硬件支持并加速纯STM系统。 In general, processor 100 may be capable of performing transactions within a UTM system that attempts to take advantage of both the STM system and the HTM system. For example, HTM is often fast and efficient for executing small transactions because it does not rely on software to perform all access tracking, conflict detection, validation, and transaction committing. However, HTMs are usually only capable of handling smaller transactions, while STMs are capable of handling transactions of unlimited size. Thus, in one embodiment, the UTM system utilizes hardware to perform smaller transactions and software to perform transactions that are too large for the hardware. As can be seen from the discussion below, hardware can be used to assist and accelerate software even when software is processing transactions. Pure STM systems can also be supported and accelerated using the same hardware.

如上面陈述的，事务包含通过处理器100内的本地处理单元以及可能通过其它处理单元对数据项进行事务存储器访问。在事务存储器系统中没有安全机制的情况下，这些访问中的一些访问将有可能导致无效数据和执行，即，对数据的写使读无效，或读无效数据。因此，处理器100可包含包含追踪或监视至数据项以及来自数据项的存储器访问以便识别潜在冲突的逻辑，诸如读监视器和写监视器，如下面所讨论的。 As stated above, transactions involve transactional memory accesses to data items by local processing units within processor 100 and possibly by other processing units. In the absence of safety mechanisms in a transactional memory system, some of these accesses will likely result in invalid data and execution, ie, writes to data invalidate reads, or reads of invalid data. Accordingly, processor 100 may include logic that tracks or monitors memory accesses to and from data items in order to identify potential conflicts, such as read monitors and write monitors, as discussed below.

在一个实施例中，处理器100包含监视器以检测或追踪与数据项相关联的访问以及潜在的随后冲突。作为一个示例，处理器100的硬件相应地包含读监视器和写监视器以追踪被确定为要被监视的载入和存储。作为示例，硬件读监视器和写监视器要以数据项的粒度监视数据项，而不管基础存储结构的粒度如何。在一个实施例中，通过以存储结构的粒度相关联的追踪机制约束数据项，以确保适当地监视至少整个数据项。 In one embodiment, processor 100 includes monitors to detect or track access and potential subsequent conflicts associated with data items. As one example, the hardware of processor 100 accordingly includes read monitors and write monitors to track loads and stores determined to be monitored. As an example, hardware read monitors and write monitors are to monitor data items at the granularity of data items regardless of the granularity of the underlying storage structure. In one embodiment, data items are constrained by a tracking mechanism associated at the granularity of the storage structure to ensure that at least the entire data item is properly monitored.

作为特定例证性示例，读监视器和写监视器包含与高速缓存位置(cache location)(诸如较低级数据高速缓存150内的位置)相关联的属性，以监视从与那些位置相关联的地址的载入和到与那些单元相关联的地址的存储。这里，在发生对与数据高速缓存150的高速缓存位置相关联的地址的读事件时，设置该高速缓存位置的读属性，以监视到相同地址的潜在冲突写。在此情况下，写属性以类似方式对于写事件操作以监视到相同地址的潜在冲突读和写。为了促进这个示例，相应地，硬件能够基于对具有如下读和/或写属性的高速缓存位置的读和写的探听来检测冲突，其中，所述读和/或写属性设置成指示这些高速缓存位置被监视。相反，在一个实施例中，设置读和写监视器或将高速缓存位置更新到缓冲状态引起探听，诸如读请求或读所有权的请求，其考虑了与要检测的其它高速缓存中所监视地址的冲突。 As a specific illustrative example, read monitors and write monitors include attributes associated with cache locations (such as locations within lower-level data cache 150) to monitor data from addresses associated with those locations. loads and stores to addresses associated with those cells. Here, upon a read event to an address associated with a cache location of data cache 150, the read attribute of the cache location is set to monitor for potentially conflicting writes to the same address. In this case, the write attribute operates in a similar manner to write events to monitor for potentially conflicting reads and writes to the same address. To facilitate this example, accordingly, the hardware is able to detect conflicts based on snooping of reads and writes to cache locations with read and/or write attributes set to indicate that these cache locations The location is monitored. Conversely, in one embodiment, setting read and write monitors or updating a cache location to a buffered state causes snoops, such as read requests or requests for read ownership, that take into account differences with monitored addresses in other caches to be detected. conflict.

因此，基于该设计，高速缓存行的监视的一致性状态和高速缓存一致性请求的不同组合导致潜在冲突，诸如保存数据项的高速缓存行处于共享读监视状态，并且探听指示到数据项的写请求。相反，保存数据项的高速缓存行处于缓冲写状态并且外部探听指示到该数据项的读请求可视为潜在冲突。在一个实施例中，为了检测访问请求和属性状态的这种组合，探听逻辑耦合到冲突检测/报告逻辑(诸如监视器和/或用于冲突检测/报告的逻辑)以及状态寄存器，以报告这些冲突。 Therefore, based on this design, different combinations of the monitored coherency state of the cache line and cache coherency requests lead to potential conflicts, such as the cache line holding the data item being in the shared read monitoring state, and the snoop indicating a write to the data item ask. Conversely, a cache line holding a data item in a buffered write state and an external snoop indicating a read request to that data item may be considered a potential conflict. In one embodiment, to detect this combination of access requests and attribute states, snoop logic is coupled to conflict detection/reporting logic (such as monitors and/or logic for conflict detection/reporting) and status registers to report these conflict.

然而，对于事务，条件和情形的任何组合都可视为无效，其可由指令(诸如提交指令)定义。可对于事务的未提交考虑的因素示例包含对事务访问的存储器位置检测冲突、损失监视器信息、损失缓冲数据、损失与事务访问的数据项相关联的元数据以及检测其它无效事件，诸如中断、环转变（ring transition）或显式用户指令(假设重新开始的事务不能继续)。 However, any combination of conditions and situations may be considered invalid for a transaction, which may be defined by an instruction, such as a commit instruction. Examples of factors that may be considered for uncommitted transactions include detecting conflicts on memory locations accessed by the transaction, loss of monitor information, loss of buffered data, loss of metadata associated with data items accessed by the transaction, and detection of other invalid events such as interrupts, Ring transition or explicit user instruction (assuming restarted transaction cannot continue).

在一个实施例中，处理器100的硬件要以缓冲的方式保存事务更新。如上面陈述的，直到提交事务才使事务写全局可见。然而，与事务写相关联的本地软件线程能够访问事务更新用于随后事务访问。作为第一示例，在处理器100中提供了单独缓冲器结构以保存缓冲的更新，其能够对本地线程提供这些更新但不对其它外部线程提供这些更新。然而，包含单独的缓冲器结构可能是昂贵且复杂的。 In one embodiment, the hardware of processor 100 is to store transactional updates in a buffered manner. As stated above, transactional writes are not made globally visible until the transaction is committed. However, native software threads associated with transactional writes can access transactional updates for subsequent transactional access. As a first example, a separate buffer structure is provided in the processor 100 to hold buffered updates that can be provided to local threads but not to other external threads. However, including separate buffer structures can be expensive and complex.

相比之下，作为另一示例，高速缓冲存储器(诸如数据高速缓存150)用于缓冲这些更新，同时提供相同的事务功能性。这里，高速缓存150能够在缓冲一致性状态保存数据项；在一种情况下，新缓冲一致性状态被添加到高速缓存一致性协议，诸如修改排除共享无效(MESI)协议以形成MESIB协议。响应于对于缓冲数据项的本地请求，即，在缓冲一致性状态保存数据项，高速缓存150向本地处理单元提供数据项以确保内部事务顺序排序。然而，响应于外部访问请求，提供未命中(miss)响应以确保直到提交才使事务更新的数据项全局可见。另外，当高速缓存150的行保持在缓冲一致性状态并被选择用于收回时，缓冲的更新不写回到更高级高速缓冲存储器-缓冲的更新未扩散通过存储器系统，即，未使缓冲的更新全局可见，直到提交之后。在提交时，缓冲行被转变到修改状态以使数据项全局可见。 In contrast, as another example, a cache such as data cache 150 is used to buffer these updates while providing the same transactional functionality. Here, cache 150 is able to hold data items in a buffer coherency state; in one case, a new buffer coherency state is added to a cache coherency protocol, such as the Modified Exclude Shared Invalidation (MESI) protocol to form the MESIB protocol. In response to a local request for a cached data item, ie, to save the data item in a cache coherency state, the cache 150 provides the data item to the local processing unit to ensure internal transaction ordering. However, in response to external access requests, a miss response is provided to ensure that data items updated by a transaction are not made globally visible until committed. In addition, when a line of cache 150 remains in a buffer coherency state and is selected for eviction, buffered updates are not written back to higher-level caches—buffered updates are not diffused through the memory system, i.e., buffered updates are not Updates are globally visible until after committing. On commit, the buffered line is transitioned to the modified state to make the data item globally visible.

注意，术语内部和外部经常相对于与事务执行相关联的线程或共享高速缓存的处理单元的角度。例如，用于执行与事务执行相关联的软件线程的第一处理单元称为本地线程。因此，在以上论述中，如果接收到到之前第一线程所写的地址的存储或从之前第一线程所写的地址的载入(这导致用于该地址的高速缓存行保持在缓冲一致性状态)，则高速缓存行的缓冲版本由于第一线程是本地线程而被提供给第一线程。相比之下，可在相同处理器内的另一处理单元上执行第二线程，但第二线程不与负责高速缓存行保持在缓冲状态的事务执行相关联-外部线程；因此，从第二线程到地址的载入或存储未命中高速缓存行的缓冲版本，并且正常高速缓存替换用于从更高级存储器检索高速缓存行的未缓冲版本。 Note that the terms internal and external are often relative to the perspective of threads or processing units that share caches associated with transaction execution. For example, a first processing unit for executing a software thread associated with transactional execution is called a native thread. Thus, in the discussion above, if a store to or a load from an address previously written by the first thread is received (which causes the cache line for that address to remain at cache coherent state), the buffered version of the cache line is provided to the first thread because the first thread is a native thread. In contrast, a second thread may execute on another processing unit within the same processor, but the second thread is not associated with the transaction execution-external thread responsible for keeping the cache line in a buffered state; thus, from the second A thread's load or store to an address misses the buffered version of the cache line, and normal cache replacement is used to retrieve the unbuffered version of the cache line from higher-level memory.

这里，在相同处理器上执行内部/本地线程和外部/远程线程，并且在一些实施例中，可在共享对高速缓存的访问的处理器的相同核内的单独处理单元上执行内部/本地线程和外部/远程线程。然而，这些术语的使用不限于此。如上面陈述的，本地可指的是共享对高速缓存的访问的多个线程，而不是特定于与事务执行相关联的单个线程，而外部或远程可指的是不共享对高速缓存的访问的线程。 Here, internal/local threads and external/remote threads execute on the same processor, and in some embodiments internal/local threads may execute on separate processing units within the same core of the processors that share access to the cache and external/remote threads. However, the use of these terms is not limited thereto. As stated above, local may refer to multiple threads that share access to the cache, rather than being specific to a single thread associated with transaction execution, while external or remote may refer to threads that do not share access to the cache thread.

如上面最初参考图1所陈述的，处理器100的架构是纯例证性的，用于论述目的。例如，在其它实施例中，可实现UBT硬件用于具有更简单有序执行处理器设计的处理器，其可不包含复杂重命名/分配器和重排序/退出单元。类似地，转换用于引用元数据的数据地址的特定示例也是示范性的，这是因为可利用将数据与相同存储器的单独条目中的元数据相关联的任何方法。 As stated above initially with reference to FIG. 1 , the architecture of processor 100 is purely illustrative for purposes of discussion. For example, in other embodiments, UBT hardware may be implemented for a processor with a simpler in-order execution processor design, which may not include complex rename/allocators and reorder/retirement units. Similarly, the specific example of translating data addresses used to reference metadata is also exemplary, as any method of associating data with metadata in separate entries of the same memory may be utilized.

转到图2，例证了在处理器中保存数据项的元数据的实施例。如所描绘的，数据项216的元数据217本地地保存在存储器215中。元数据包含与数据项216相关联的任何性质或属性，诸如与数据项216相关的事务信息。下面包含元数据的一些例证性示例；然而，公开的元数据示例是纯例证性的。这样，元数据位置(metadata location)217可保存数据项216的信息和其它属性的任何组合。 Turning to FIG. 2 , an embodiment of storing metadata of a data item in a processor is illustrated. As depicted, metadata 217 for data item 216 is stored locally in memory 215 . Metadata includes any properties or attributes associated with a data item 216 , such as transactional information related to the data item 216 . Some illustrative examples of metadata are included below; however, the disclosed metadata examples are purely illustrative. As such, metadata location 217 may hold any combination of information and other attributes of data item 216.

作为第一示例，如果之前已经在事务内访问、缓冲和/或备份了事务写数据项216，则元数据217包含对数据项216的备份或者缓冲区位置的引用。这里，在一些实现中，数据项216的先前版本的备份拷贝保存在不同位置，并且因此，元数据217包含备份单元的地址或对备份位置的其它引用。备选地，元数据217本身可充当数据项216的备份或缓冲区位置。 As a first example, the metadata 217 contains a reference to the backup or buffer location of the data item 216 if the transactionally written data item 216 has been previously accessed, buffered, and/or backed up within the transaction. Here, in some implementations, backup copies of previous versions of data items 216 are kept at different locations, and thus, metadata 217 includes addresses of backup units or other references to backup locations. Alternatively, metadata 217 itself may serve as a backup or buffer location for data item 216 .

作为另一个示例，元数据217包含用于加速对数据项216的重复事务访问的过滤器值。经常，在利用软件执行事务期间，在事务存储器访问时执行访问屏障，以确保一致性和数据有效性。例如，在事务载入操作之前，执行读屏障以执行读屏障操作，诸如测试数据项216是否未锁定，确定事务的当前读集合是否仍有效，更新过滤器值，并将版本值登录在事务的读集合中以便允许随后验证。然而，如果在事务执行期间已经执行了那个位置的读，则相同读屏障操作有可能不必要。 As another example, metadata 217 includes filter values for expediting repeated transactional access to data item 216 . Often, access barriers are enforced on transactional memory accesses to ensure consistency and data validity during execution of transactions in software. For example, before a transaction load operation, a read barrier is performed to perform read barrier operations, such as testing whether the data item 216 is unlocked, determining whether the current read set of the transaction is still valid, updating the filter value, and logging the version value in the transaction's Read in the collection to allow subsequent validation. However, the same read barrier operation may not be necessary if a read at that location has already been performed during transaction execution.

因此，一个解决方案包含利用读过滤器保存第一默认值以指示数据项216在事务执行期间未被读，或者因此地址在事务执行期间未被读，还保存第二访问值以指示数据项216已经在事务未决期间被访问，或者因此地址已经在事务未决期间被访问。实质上，第二访问值指示是否应该加速读屏障。在这种情况下，如果接收到事务载入操作，并且元数据位置217中的读过滤器值指示已经读了数据项216，则在一个实施例中省略-不执行-读屏障，以通过不执行不必要的冗余读屏障操作来加速事务执行。注意，写过滤器值可关于写操作以相同方式操作。然而，各个过滤器值是纯例证性的，这是因为在一个实施例中利用单个过滤器值指示是否已经访问了地址-而无论是写还是读。这里，对于载入和存储两者都检查216的元数据217的元数据访问操作利用单个过滤器值，其与以上示例(其中元数据217包含单独的读过滤器值和写过滤器值)不同。作为特定例证性实施例，元数据217的四位被分配给读过滤器以指示关于相关联数据项是否要加速读屏障，被分配给写过滤器以指示关于相关联数据项是否要加速写屏障，被分配给撤销过滤器以指示要加速撤销操作，以及被分配给要作为过滤器值由软件以任何方式利用的混杂过滤器（miscellaneous filter）。 Therefore, one solution consists of using a read filter to store a first default value to indicate that the data item 216 was not read during the transaction execution, or therefore the address was not read during the transaction execution, and also store a second access value to indicate that the data item 216 has been accessed while the transaction is pending, or therefore the address has been accessed while the transaction is pending. Essentially, the second access value indicates whether the read barrier should be accelerated. In this case, if a transactional load operation is received, and the read filter value in metadata location 217 indicates that data item 216 has been read, then in one embodiment omit-do not perform-read barrier to pass no Perform unnecessary redundant read barrier operations to speed up transaction execution. Note that write filter values can operate in the same manner with respect to write operations. However, the individual filter values are purely illustrative, as in one embodiment a single filter value is utilized to indicate whether an address has been accessed - whether writing or reading. Here, metadata access operations for metadata 217 that both load and store check 216 utilize a single filter value, as opposed to the above example (where metadata 217 contained separate read and write filter values) . As a specific illustrative embodiment, four bits of metadata 217 are assigned to read filters to indicate whether read barriers are to be accelerated with respect to the associated data item, and to write filters to indicate whether write barriers are to be accelerated with respect to the associated data item , assigned to undo filters to indicate that undo operations are to be accelerated, and to miscellaneous filters to be exploited in any way by software as filter values.

元数据的几个其它示例包含：(对于与数据项216相关联的事务而言通用或特定的)处理机地址的指示、表示或引用、与数据项216相关联的事务的不可撤销/难以取消的特性、数据项216的损失、数据项216的监视信息的损失、对于数据项216检测到的冲突、与数据项216相关联的读集合的地址或读集合内读条目的地址、数据项216的之前登录的版本、数据项216的当前版本、允许访问数据项216的锁、数据项216的版本值、与数据项216相关联的事务的事务描述符以及其它已知事务相关描述性信息。另外，如上所述，使用元数据不限于事务信息。作为推论，元数据217也可包含与数据项216相关联的不涉及事务的信息、性质、属性或状态。 A few other examples of metadata include: an indication, representation or reference of a handler address (generic or specific to the transaction associated with the data item 216), irrevocable/difficult to cancel of the transaction associated with the data item 216 properties of data item 216, loss of monitoring information for data item 216, collision detected for data item 216, address of a read set associated with data item 216 or address of a read entry within a read set, data item 216 The previously logged version of the data item 216, the current version of the data item 216, the lock that allows access to the data item 216, the version value of the data item 216, the transaction descriptor of the transaction associated with the data item 216, and other known transaction-related descriptive information. Also, as mentioned above, the use of metadata is not limited to transactional information. As a corollary, metadata 217 may also include non-transactional information, properties, attributes, or states associated with data item 216 .

以对于UTM系统的这个背景，将论述如何启动事务的接下来考虑。当线程进入事务时，它们转变到TM执行模式之一。如果没有线程处于任何类型STM模式(一般而言，任何STM模式都称为*STM模式)，则当前线程可使用隐式模式CRITM。许多线程由此可处于CRITM模式。如果线程溢出硬件的约束容量或执行不能在当前模式中完成的某一语义动作，则CRITM事务将转返并在某一*STM模式重新执行。一旦任何线程处于*STM模式，则所有其它线程就必须离开CRITM模式(转返)并在STM锁遵守(lock-respecting)模式(诸如CRESTM)中重新执行。存在多个可能执行变体组合，例如CRITM和CRESTM。为了论述目的，本文将使用该模式组合。 With this background for a UTM system, the next consideration of how to initiate a transaction will be discussed. When threads enter a transaction, they transition to one of the TM execution modes. If no thread is in any type of STM mode (in general, any STM mode is called *STM mode), the current thread can use the implicit mode CRITM. Many threads can thus be in CRITM mode. If the thread overflows the constrained capacity of the hardware or performs some semantic action that cannot be done in the current mode, the CRITM transaction will fall back and be re-executed in one of the *STM modes. Once any thread is in *STM mode, all other threads must leave CRITM mode (return) and re-execute in an STM lock-respecting mode (such as CRESTM). There are several possible implementation variant combinations such as CRITM and CRESTM. For discussion purposes, this article will use this combination of patterns.

表1将这两个示例事务执行模式彼此比较并与现代普通的非事务模式相比较。 Table 1 compares these two example transactional execution patterns to each other and to a modern common non-transactional pattern.

表1Table 1

执行变体execute variant 描述describe 事务模式transaction mode 事务兼容transaction compatible 代码生成模式code generation mode CPU模式 CPU mode CLR模式CLR mode 普通ordinary 非事务Non-transactional NTNT N/AN/A NKNK MB_NONEMB_NONE 非事务Non-transactional CRITMCRITM 高速缓存驻留，隐式模式，没有软件锁Cache resident, implicit mode, no software lock CRNLCRNL CRNLCRNL NKNK MB_DATAMB_DATA 事务隐式transaction implicit CRESTMCRES™ 高速缓存驻留，显式模式，使用软件锁和事务vtablecache-resident, explicit mode, using software locks and transactional vtables CRCR CR、HAMF、HAF、STMCR, HAMF, HAF, STM TVTV MB_NONEMB_NONE 事务显式transaction explicit

不可避免的是，一些事务将失败，例如由于缓冲数据的损失或冲突，并且这样事务将中止。在一些情况下，事务的模式可在重新执行时间改变。事务可能“后退”到性能较低的模式或“升级”到较高性能模式。也就是说，从性能角度不是所有模式都相等。一般而言，CRITM是性能最好的执行模式，这是因为它避免了处理软件锁的开销。性能其次的模式是CRESTM，接着是HASTM，并且然后是STM。STM和HASTM模式在它们提供的功能性上是等效的，由此STM用于在以下论述中表示这两种模式。 Inevitably, some transactions will fail, eg due to loss of buffered data or conflicts, and such transactions will abort. In some cases, the mode of a transaction may change at re-execution time. Transactions may "fall back" to a lower performance mode or "upgrade" to a higher performance mode. That said, not all modes are equal from a performance standpoint. In general, CRITM is the best performing execution mode because it avoids the overhead of dealing with software locks. The next best mode for performance is CRESTM, then HASTM, and then STM. STM and HASTM schemas are equivalent in the functionality they provide, thus STM is used to represent both schemas in the following discussion.

然而，所有事务不能都在CRITM模式中运行，这是因为它仅在高速缓存驻留事务上操作。由于CRESTM模式也限于高速缓存驻留事务，所以不是高速缓存驻留的任何事务都需要在STM模式下运行。CRITM模式与STM模式不兼容，因此，一个事务一开始在STM模式下操作，就没有事务能在CRITM模式下运行。由此，在这一点，所有高速缓存驻留事务转移到CRESTM模式。 However, all transactions cannot run in CRITM mode because it only operates on cache-resident transactions. Since CRESTM mode is also limited to cache-resident transactions, any transaction that is not cache-resident needs to run in STM mode. CRITM mode is not compatible with STM mode, therefore, once a transaction starts operating in STM mode, no transaction can run in CRITM mode. Thus, at this point, all cache-resident transactions are shifted to CRESTM mode.

关于在哪个模式下执行事务的广泛约束可概括为如下：所有事务都在CRITM模式中开始，然而，如果STM事务正在运行，则所有事务在CRESTM模式中开始。如果事务溢出高速缓存，则它转返并重新开始在STM模式下执行。如果事务正在STM模式下执行，则所有CRITM事务都毁灭，并且在CRESTM模式下重新开始执行。 Broad constraints on which mode to execute transactions in can be summarized as follows: all transactions start in CRITM mode, however, if STM transactions are running, all transactions start in CRESTM mode. If the transaction overflows the cache, it wraps around and resumes execution in STM mode. If a transaction is executing in STM mode, all CRITM transactions are destroyed and execution restarts in CRESTM mode.

在一个实施例中，围绕对于“重试”原语的支持存在一些附加约束：如果事务使用“重试”原语，则它只能在STM模式中执行，这是因为CRITM和CRESTM不支持等待重试。如果系统中的任何事务正在等待“重试”，则所有其它事务需要在CRESTM模式或STM模式中执行，这是因为CRITM不支持通知。 In one embodiment, there are some additional constraints around support for the "retry" primitive: If a transaction uses the "retry" primitive, it can only execute in STM mode, because CRITM and CRESTM do not support waits Retry. If any transaction in the system is waiting for a "retry", then all other transactions need to execute in CRESTM mode or STM mode, this is because CRITM does not support notifications.

现在参考图3，示出了根据本发明一实施例用于选择用于执行TM事务的事务执行模式的方法的流程图。在一个实施例中，方法300可通过UTM系统的运行时间实现。如所看到的，图3可开始于确定其它事务在系统中是否活动(菱形块310)。这样做是因为某些硬件事务模式与STM事务不兼容。如果没有其它此类事务是活动的，则控制传到块320，在此，事务可开始于可用的性能最高的模式。在本文描述的上下文中，性能最高的模式可以是硬件隐式事务模式(例如CRITM)。当然，在不同实现中，不同模式或修改模式可以是可用的。 Referring now to FIG. 3 , there is shown a flowchart of a method for selecting a transaction execution mode for executing a TM transaction according to an embodiment of the present invention. In one embodiment, the method 300 can be implemented by the runtime of the UTM system. As seen, Figure 3 may begin by determining whether other transactions are active in the system (diamond 310). This is done because some hardware transaction modes are not compatible with STM transactions. If no other such transactions are active, control passes to block 320 where the transaction may begin in the highest performance mode available. In the context described here, the highest performing mode can be a hardware implicit transaction mode (such as CRITM). Of course, in different implementations different or modified modes may be available.

如果相反在菱形块310确定其它事务是活动的，则控制传到菱形块325，在此它可确定这些事务中的任一个事务是否处于STM模式。如果是，则新事务可开始于与STM模式一致的性能最高的模式(块330)。例如，在本文论述的实现中，这个最高兼容模式可以是其中硬件辅助事务的硬件显式模式(例如CRESTM)，其可完全驻留在处理器高速缓存内，但遵守软件锁。 If instead at diamond 310 it is determined that other transactions are active, then control passes to diamond 325 where it can determine whether any of these transactions are in STM mode. If so, a new transaction may begin in the highest performing mode consistent with the STM mode (block 330). For example, in the implementations discussed herein, this highest compatibility mode may be a hardware-explicit mode (eg, CRES™) in which hardware-assisted transactions can reside entirely within processor caches, but obey software locks.

相应地，事务开始并且操作继续。然而，可确定是否发生溢出(菱形块335)。也就是说，因为所有事务都可开始于某种类型高速缓存驻留硬件辅助模式，所以有可能的是，高速缓存空间不足以处理完整事务。相应地，如果在菱形块335确定发生溢出，则控制可传到块375，将在下面对此进一步论述。如果相反事务不溢出，则接下来可确定事务是否已经完成(菱形块340)。如果否，则可发生继续执行。如果事务已经完成，则控制传到菱形块350，在此可确定是否已经保持了事务的硬件性质。也就是说，在事务提交之前，可以检查各种硬件性质(例如缓冲、监视和元数据的UTM性质)以确定它们仍活动，而没有损失。如果否，则某硬件性质的损失已经发生，并且控制传到360，在此中止事务。否则，如果事务成功完成，并且硬件性质保持不变，则控制可传到块355，在此提交事务。 Accordingly, the transaction begins and the operation continues. However, it can be determined whether overflow has occurred (diamond 335). That is, since all transactions may begin in some type of cache-resident hardware-assisted mode, it is possible that there will not be enough cache space to handle a complete transaction. Accordingly, if it is determined at diamond 335 that an overflow has occurred, control may pass to block 375, which will be discussed further below. If instead the transaction does not overflow, it may next be determined whether the transaction has completed (diamond 340). If not, continuing execution may occur. If the transaction has completed, control passes to diamond 350 where it can be determined whether the hardware nature of the transaction has been maintained. That is, before a transaction commits, various hardware properties (such as buffering, monitoring, and UTM properties of metadata) can be checked to determine that they are still alive without penalty. If not, then some loss of a hardware nature has occurred and control passes to 360 where the transaction is aborted. Otherwise, if the transaction completes successfully, and the hardware properties remain unchanged, then control can pass to block 355 where the transaction is committed.

仍参考图3，如果相反在执行高速缓存驻留事务期间，事务溢出高速缓存(如在菱形块335所确定的)，则控制传到块375。那里，事务可转返并在STM模式中重新执行。当在STM模式中执行期间，可确定是否已经违反语义(菱形块380)。如果是，则控制传到块382，在此事务可转返并在性能较低的模式(例如纯STM模式)中重新执行。相对于确定事务是否完成以及它是否能成功提交或是否需要中止的类似操作可发生(块385、390、392、395)，如关于硬件辅助事务所讨论的。虽然在图3的实施例中用这个具体实现描述了，但要理解，在其中执行事务的给定模式的控制在不同实现中可以改变。 Still referring to FIG. 3 , if instead during execution of the cache resident transaction, the transaction overflows the cache (as determined at diamond 335 ), then control passes to block 375 . There, the transaction can be rolled back and re-executed in STM mode. While executing in STM mode, it may be determined whether a semantic has been violated (diamond 380). If so, control passes to block 382 where the transaction can be turned back and re-executed in a lower performance mode (eg, pure STM mode). Similar operations may occur (blocks 385, 390, 392, 395) relative to determining whether a transaction is complete and whether it can be successfully committed or needs to be aborted, as discussed with respect to hardware-assisted transactions. Although described with this particular implementation in the embodiment of FIG. 3, it is understood that the control of a given mode in which transactions are performed may vary in different implementations.

由此，图3的方法一般地阐述了如何确定开始事务的适当模式。然而，在事务期间，由于不同于高速缓存溢出或STM失败的其它原因可失败。在各种实施例中，可实现后退和升级机制以确定新(或重新执行的)事务应该在哪个模式中执行，使得满足上面描述的约束并且系统实现最优性能。表2在下面示出了用于毁灭(终止)事务的原因集。表2中的第一列描述了可毁灭事务的各种原因，并且第二列和第三列分别描述了之前未决的CRITM或CRESTM事务将在其中重新执行的新模式。留出空白的单元格指示出于给定原因不会毁灭事务。 Thus, the method of FIG. 3 generally illustrates how to determine the appropriate mode to begin a transaction. However, during a transaction, failures may occur for other reasons than cache overflows or STM failures. In various embodiments, a back-off and upgrade mechanism may be implemented to determine in which mode a new (or re-executed) transaction should execute so that the constraints described above are met and the system achieves optimal performance. Table 2 below shows the set of reasons for destroying (terminating) a transaction. The first column in Table 2 describes the various reasons for a destructible transaction, and the second and third columns describe the new mode in which a previously pending CRITM or CRESTM transaction will be re-executed, respectively. A cell left blank indicates that the transaction will not be destroyed for the given reason.

表2Table 2

the 原因reason CRITMCRITM CRESTMCRES™ the 另一个事务含有“重试”Another transaction contains "retry" CRESTMCRES™ the the 当前事务含有“重试”Current transaction contains "retry" STMSTM STMSTM the 需要开放嵌套事务/完全抑制Requires open nested transactions/full suppression STMSTM STMSTM the 超过高速缓存容量cache capacity exceeded STMSTM STMSTM the 闭合嵌套平坦化事务抛出异常Closed nested flattened transaction throws exception STMSTM STMSTM the 代码需要JIT化Code needs to be JITized CRITMCRITM CRITM、CRESTMCRI™, CRES™ the 事务毁灭它自己(例如当在穿通CAS不可用的情况下修改对象标题时)A transaction destroys itself (e.g. when modifying object headers while passthrough CAS is not available) CRITM、STMCRITM, STM CRITM、CRESTM、STMCRITM, CRESTM, STM the GC延缓执行当前事务的线程GC delays the thread executing the current transaction CRITMCRITM CRITM、CRESTMCRI™, CRES™ the 一个或多个STM事务起动One or more STM transactions started CRESTMCRES™ the 00 所有STM事务都终止All STM transactions are terminated the CRITMCRITM 11 损失监视loss monitoring CRITMCRITM CRITM、CRESTMCRI™, CRES™ 22 损失缓冲loss buffer CRITMCRITM CRITM、CRESTMCRI™, CRES™

关于表2注意到，第一优先级将是在CRTIM模式中重新执行。然而，如果事务需要在CRITM模式中不可用的功能性或STM事务正在进行中，则该事务将在CRESTM模式中重新执行。出于这个原因终止CRESTM事务的判定将基于试探法。此外，没有CRITM事务应该在这点运行。 Note with respect to Table 2 that the first priority will be to re-execute in CRTIM mode. However, if a transaction requires functionality not available in CRITM mode or an STM transaction is in progress, the transaction will be re-executed in CRESTM mode. The decision to terminate a CRESTM transaction for this reason will be based on heuristics. Also, no CRITM transactions should be running at this point.

注意，在表2中示出的设计选项中可能存在活动余地。例如，有可能设计是高速缓存驻留但仍提供基于软件的失败原子性的模式。这种模式可用于解决嵌套事务失败。 Note that there may be leeway in the design options shown in Table 2. For example, it is possible to design patterns that are cache-resident but still provide software-based atomicity of failures. This pattern can be used to resolve nested transaction failures.

现在参考图4，示出的是作为在具体模式中执行的事务失败的结果处理模式切换的方法的流程图。在一个实施例中，可通过当在第一模式中的事务失败时接收控制的弹出处理机实现图4的方法400。如所看到的，方法400可开始于确定事务失败的原因(块410)。这个确定可基于接收的各种信息进行。作为示例，TCR和/或TSR的信息可指示失败的原因。类似地，事务控制块也可指示失败的原因。仍进一步地，在其它实现中，可使用获得这个信息的其它方式。 Referring now to FIG. 4 , shown is a flowchart of a method of handling a mode switch as a result of a failure of a transaction executing in a particular mode. In one embodiment, the method 400 of FIG. 4 may be implemented by a pop handler that receives control when a transaction in the first mode fails. As seen, method 400 may begin by determining why a transaction failed (block 410). This determination can be made based on various information received. As an example, the TCR and/or TSR information may indicate the reason for the failure. Similarly, the transaction control block can also indicate the reason for the failure. Still further, in other implementations, other ways of obtaining this information may be used.

仍参考图4，一般而言，可基于毁灭事务的原因选择不同恢复路径、例如重新执行事务的不同路径。此外，虽然在图4的实施例中按具体顺序描述和示出，但要理解，这是为了便于论述，并且在各种实现中进行的各种确定可按不同顺序(并以不同方式)发生。如所看到的，在菱形块415，可确定当前事务模式是否不支持所需功能性。下面将论述此类不支持功能性示例。如果这被确定为事务失败的原因，则控制可传到块420，在此可发生选择支持这个功能性的另一模式。相应地，可在模式切换到这个新模式之后重新执行该事务。 Still referring to FIG. 4 , in general, different recovery paths, such as different paths to re-execute the transaction, may be selected based on the reason for destroying the transaction. Additionally, while described and shown in a specific order in the embodiment of FIG. 4, it is to be understood that this is for ease of discussion and that various determinations made in various implementations may occur in different orders (and in different ways) . As seen, at diamond 415, it can be determined whether the current transaction mode does not support the required functionality. Examples of such unsupported functionality are discussed below. If this is determined to be the cause of the transaction failure, control may pass to block 420 where selecting another mode to support this functionality may occur. Accordingly, the transaction can be re-executed after the mode switch to this new mode.

用于毁灭事务的又一个原因可能是事务毁灭它自己，如在菱形块425所确定的。如果是，则也可以确定事务已经毁灭它自己的次数。这个数可与阈值相比较(菱形块430)。如果该数在这个阈值以上，指示事务继续毁灭它自己，则事务可切换到不同模式(块435)。如果不满足阈值，则可在相同模式中发生重新执行(块440)。 Yet another reason for destroying a transaction may be that the transaction destroys itself, as determined at diamond 425 . If so, the number of times the transaction has destroyed itself can also be determined. This number can be compared to a threshold (diamond 430). If the number is above this threshold, indicating that the transaction continues to destroy itself, the transaction may switch to a different mode (block 435). If the threshold is not met, re-execution may occur in the same mode (block 440).

用于毁灭事务的又一原因可能是外部系统活动是否引起毁灭。如果这被确定了(在菱形块450)，则可确定这个外部活动是否是未决STM事务数量的增大(菱形块455)。如果是(并且当前事务是硬件隐式模式事务)，则可在硬件显式模式中重新执行该事务(块460)。如果不是STM事务数量的增大，而是确定实际上存在未决STM事务的减小(如在菱形块462所确定的)，则可以进行关于是否在硬件隐式模式中重新开始未决硬件显式事务的确定，原因在于这是性能更好的(块465)。下面将进一步论述进行这个确定的不同考虑。如果STM事务中的改变不是外部系统活动，则该事务可在其当前模式中重新执行(块470)。类似地，如果存在事务失败的某种其它原因，例如由于冲突或其它此类原因，则该事务可在相同模式中重新执行(块480)。虽然在图4的实施例中通过这个具体实现示出，但是要理解，本发明的范围不限于这个方面。 Yet another reason for destroying a transaction may be whether external system activity caused the destruction. If this is determined (at diamond 450), it may be determined whether this external activity was an increase in the number of pending STM transactions (diamond 455). If so (and the current transaction is a hardware implicit mode transaction), the transaction may be re-executed in hardware explicit mode (block 460). If instead of an increase in the number of STM transactions, it is determined that there is actually a decrease in pending STM transactions (as determined at diamond 462), then a decision can be made as to whether to restart pending hardware explicit in hardware implicit mode. Formal transaction is determined because this is better in performance (block 465). The different considerations for making this determination are discussed further below. If the change in the STM transaction was not external system activity, the transaction can be re-executed in its current mode (block 470). Similarly, if there is some other reason for the transaction to fail, such as due to a conflict or other such reason, the transaction can be re-executed in the same mode (block 480). While shown with this particular implementation in the embodiment of FIG. 4, understand that the scope of the present invention is not limited in this respect.

上面表2中描述的原因和在图4中论述的原因可分类成四个广泛类别。对于每个类别，描述了后退和升级机制。第一失败原因类别可以是给定执行模式不支持的功能性。原因1-5落入CRITM的这个桶。对于CRESTM，原因2-5落入这个桶。这些原因对事务而言是不可或缺的，并且暴露了当前执行模式的限制。由此，事务的重新执行不应该在它早先执行的相同模式中，并且相反，可发生到具有所需支持的模式的切换。为了支持这个功能性，可对事务上下文执行持久写(当毁灭事务时)，规定当重新执行该事务时应该使用的模式。 The reasons described above in Table 2 and discussed in Figure 4 can be classified into four broad categories. For each category, fallback and upgrade mechanisms are described. The first failure cause category may be functionality not supported by a given execution mode. Reasons 1-5 fall into this bucket of CRITM. For CRESTM, reasons 2-5 fall into this bucket. These reasons are integral to transactions and expose limitations of the current execution model. Thus, a re-execution of a transaction should not be in the same mode in which it was executed earlier, and instead a switch to a mode with the required support may occur. To support this functionality, persistent writes can be performed on the transaction context (when the transaction is destroyed), specifying the mode that should be used when re-executing the transaction.

第二失败原因类别可以是事务在哪里自杀(毁灭它自己)。原因6和7落入这个类别中。对于原因6，事务可转返，可以执行所需块的编译(例如刚好及时(JIT))，并且然后，然后在相同模式中重新执行事务。这是因为JIT化(JIT’ing)功能是十分昂贵的，因此转返和重新执行的开销将不明显。对于原因7，可在相同模式中重新执行事务。这样做是因为首先，监视/缓冲的行下次(next time around)可能不包含对象标题，并且其次，没有办法知道因为写到对象标题而发生监视(或缓冲)损失。在一些实现中，对于事务因为写到对象标题而保持毁灭它自己的情形可提供预防措施。作为一个示例，可以设置如下规则：重新执行N(大于某预先确定阈值)次的任何CRITM/CRESTM事务都将在STM模式中重新执行。 A second failure cause category may be where a transaction commits suicide (destroys itself). Reasons 6 and 7 fall into this category. For reason 6, the transaction can be rolled back, compilation of the required blocks can be performed (eg, just in time (JIT)), and then the transaction can then be re-executed in the same mode. This is because JIT'ing functions is very expensive, so the overhead of going back and re-executing will not be noticeable. For reason 7, the transaction can be re-executed in the same schema. This is done because first, the watched/buffered line may not contain the object header next time (next time around), and second, there is no way of knowing that a watch (or buffer) penalty occurred because of writing to the object header. In some implementations, precautions may be provided for the case where a transaction keeps destroying itself because of writing to an object header. As an example, a rule can be set that any CRITM/CRESTM transaction that is re-executed N (greater than some predetermined threshold) times will be re-executed in STM mode.

第三失败原因类别可以是当前事务外部的系统活动毁灭它。原因8-10落入这个类别中。对于原因8，即便由于无用单元收集(GC)延缓引起转返事务，也没有不在相同模式重试的原因，并且由此可在早先执行事务所在的模式中重新执行该事务。对于原因9，可在存储器中保持当前运行STM事务的全局计数器。新STM事务无论何时开始，这个计数器都可递增(例如利用InterlockedIncrement)，并且当STM事务转返/中止/提交时，在计数器上可发生对应的递减(例如利用InterlockedDecrement)。CRITM事务也可在这个全局计数器上执行监视读，使得STM事务无论何时起动，所有CRITM事务都被毁灭并在CRESTM模式中重新执行。 A third failure cause category may be that system activity outside the current transaction destroys it. Reasons 8-10 fall into this category. For reason 8, even if the transaction is rolled back due to a garbage collection (GC) stall, there is no reason not to retry in the same mode, and thus the transaction can be re-executed in the mode in which the transaction was executed earlier. For reason 9, a global counter of currently running STM transactions may be maintained in memory. Whenever a new STM transaction starts, this counter can be incremented (eg, using InterlockedIncrement), and when the STM transaction rolls over/aborts/commits, a corresponding decrement can occur on the counter (eg, using InterlockedDecrement). CRITM transactions can also perform watch reads on this global counter so that whenever an STM transaction starts, all CRITM transactions are destroyed and re-executed in CRESTM mode.

CRITM是性能最好的模式，并且由此可试图避免主动地毁灭CRITM事务。一个解决方案可以是STM事务无论何时将要开始，它都首先检查系统当前是否含有运行的CRITM事务。如果系统的确含有CRITM事务，则可控制STM事务以在开始之前等待有限量的时间。这种等待时间可允许当前运行的CRITM事务完成执行，而没有延迟STM事务太多。 CRITM is the best performing mode, and thus attempts to avoid actively destroying CRITM transactions. One solution could be that whenever an STM transaction is about to start, it first checks to see if the system currently contains a running CRITM transaction. If the system does contain CRITM transactions, STM transactions can be controlled to wait a finite amount of time before starting. This wait time may allow currently running CRITM transactions to complete execution without delaying STM transactions too much.

对于原因10，系统中的所有STM事务无论何时中止，一个实现可以是毁灭所有CRESTM事务并在CRITM模式重新开始它们。然而，如果CRESTM事务将要在毁灭它之前完成，则可执行自旋机制(spin mechanism)。在这里的最终判定将基于CRESTM开销相比CRITM：如果平均上，CRESTM事务的速度低于CRITM事务速度的二分之一，则毁灭CRESTM事务并且在CRITM模式中重新开始它们将是性能更好的，否则，继续在CRESTM模式中将是性能更好的。在仍有的其它实现中，会有可能将运行的事务从CRESTM模式转变到CRITM模式。 For reason 10, whenever all STM transactions in the system are aborted, an implementation could be to destroy all CRESTM transactions and restart them in CRITM mode. However, if a CRESTM transaction is to complete before destroying it, a spin mechanism may be implemented. The final decision here will be based on CRESTM overhead vs. CRITM: if on average, CRESTM transactions are slower than 1/2 the speed of CRITM transactions, then it will be more performant to destroy CRESTM transactions and restart them in CRITM mode , otherwise, it will be more performant to continue in CRESTM mode. In yet other implementations, it may be possible to transition a running transaction from CRES™ mode to CRI™ mode.

在缓冲/监视块上可发生有效读-写(r-w)或写-写(w-w)冲突。原因10和11属于这个类别。如果事务因为它损失了对高速缓存行的监视或缓冲而被毁灭，则它可在与早先相同的模式中重试。在此的一个关注事项是，如果访问高速缓存行的新事务毁灭较老事务，则该老事务当它重新开始时可毁灭该新事务。这可导致乒乓效应，其中事务都完不成。竞争管理逻辑可用于处理这种情形。 Active read-write (r-w) or write-write (w-w) conflicts can occur on buffer/monitor blocks. Reasons 10 and 11 fall into this category. If a transaction is destroyed because it lost watch or buffer for a cache line, it can be retried in the same mode as earlier. One concern here is that if a new transaction accessing a cache line destroys an older transaction, the old transaction can destroy the new transaction when it restarts. This can lead to a ping-pong effect in which no transaction is completed. Contention management logic can be used to handle this situation.

在一些实现中，当事务将要开始或重新开始执行时，优化是：如果事务需要在CRESTM模式中开始的唯一原因是“系统含有一个或多个STM事务”，则自旋机制可用于在重试之前等待。如果在等待之后，STM事务仍运行，则可在CRESTM模式中开始当前事务，否则可在CRITM模式中开始该事务，并且可避免CRESTM开销。类似逻辑可应用于重新开始的任何CRESTM事务。因此，在以上论述中，当应该在相同模式中重新开始事务时，存在警告：如果那个模式是CRESTM，则可首先确定事务是否可在CRITM模式下运行。 In some implementations, when a transaction is about to start or restart execution, the optimization is: if the only reason a transaction needs to start in CRESTM mode is "the system contains one or more STM transactions", the spin mechanism can be used to retry before waiting. If after the wait, the STM transaction is still running, the current transaction can be started in CRESTM mode, otherwise the transaction can be started in CRITM mode and CRESTM overhead can be avoided. Similar logic can be applied to any CRES™ transaction that is restarted. Thus, in the above discussion, there is a caveat when a transaction should be restarted in the same mode: if that mode is CRESTM, it can first be determined whether the transaction can run in CRITM mode.

为了论述目的，CRESTM使用TV代码生成风格，以及基于异常的转返，而CRITM使用NK代码生成风格，以及基于长跳(longjmp-based)的转返。 For exposition purposes, CRESTM uses TV code generation style with exception-based return, while CRITM uses NK code generation style with longjmp-based return.

现在考虑应该如何转换词汇原子块(一般称为“s”)。(为了此论述目的，假设有关事务的所有状态都保持在当前事务对象中，忽略事务上下文。)

原语取

，独特的小密集整数ID标识词汇事务(lexical transaction)。这个ID用于索引到含有关于词汇事务的竞争管理信息的全局数据结构中。这个结构也可用于存储指示在哪个执行模式开始事务的永久性信息。该原语可将这个信息设置为事务的属性。 Consider now how lexical atomic chunks (generally called "s") should be transformed. (For the purposes of this discussion, assume that all state about the transaction is maintained within the current transaction object, ignoring the transaction context.)

primitive fetch

, a unique small dense integer ID identifying a lexical transaction. This ID is used to index into a global data structure containing contention management information about lexical transactions. This structure can also be used to store persistent information indicating in which execution mode a transaction was started. This primitive can set this information as an attribute of the transaction.

下面在表3-5中提供了代码块到TM支持代码的三种转换。表3的伪代码假设CRESTM和STM是仅有的执行模式，表4的伪代码假设CRITM是仅有的执行模式，并且表5的伪代码试图考虑所有三种可能性。 The three conversions of code blocks to TM support codes are provided below in Table 3-5. The pseudocode of Table 3 assumes that CRESTM and STM are the only execution modes, the pseudocode of Table 4 assumes CRITM is the only execution mode, and the pseudocode of Table 5 attempts to consider all three possibilities.

如果CRESTM和STM是仅有的执行模式，则用表3的伪代码阐述该转换。 If CRESTM and STM are the only modes of execution, the conversion is illustrated in the pseudocode of Table 3.

表3 table 3

Figure 2010800639316100002DEST_PATH_IMAGE003

。

.

如在表3中所看到的，可使用

原语创建事务。其SiteID将确定一组初始属性，包括当前使用的事务vtable。在所有模式中，可将有效的(live)本地变量(或者就是可在事务中修改的那些)保存到堆栈位置(stack locations)。在那之后，如果当前执行模式正在使用硬件加速，则开始硬件事务。事务执行。如果它转返，则到达catch(捕获)语句，这是因为发出了基于处理机异常的转返。可首先恢复本地变量值。这是必要的，而无论处理机(HandleEx)判定重新执行(通过返回)还是整理并再次抛出中止用户执行-本地变量在捕获抛出的异常的捕获语句中可能是有效的。如果处理机判定重新执行，则它可改变事务的属性“curtx”。例如，它可将事务vtable改变成使用STM而不是CRESTM。 As seen in Table 3, one can use

Primitives create transactions. Its SiteID will determine an initial set of attributes, including the transaction vtable currently in use. In all modes, valid (live) local variables (or simply those that can be modified within a transaction) can be saved to stack locations. After that, a hardware transaction is started if the current execution mode is using hardware acceleration. Transaction execution. If it returns, a catch statement is reached because a return based on a handler exception was issued. Local variable values can be restored first. This is necessary regardless of whether the handler (HandleEx) decides to re-execute (by returning) or clean up and throw again to abort user execution - the local variable may be valid in the catch statement that catches the thrown exception. If the handler decides to re-execute, it may change the transaction's attribute "curtx". For example, it can change the transaction vtable to use STM instead of CRESTM.

如果CRITM是仅有的执行模式，则在表4的伪代码中阐述该转换。 If CRITM is the only execution mode, the conversion is illustrated in the pseudocode of Table 4.

表4Table 4

Figure 2010800639316100002DEST_PATH_IMAGE005

如在表4中所看到的，由于上面论述的原因，假设

操作不仅保存堆栈指针、基本指针和指令指针(ESP)、(EBP)和(IP)，而且保存所有被调用者保存寄存器(callee-save register)。它保存的IP可就在对的调用之后，因此，就像setjmp/longjmp，操作可重新开始，就好像从该调用返回一样。收回器(ejector)将恢复保存的寄存器值，并跳到保存的IP。注意，S的“裸”变换不精确等于S，这是因为当控制流程离开S时，可能存在一些显式动作以提交事务。由于发生基于长跳的转返，因此仅正在抛出的用户级异常到达捕获语句。与在CRESTM中一样，恢复保存的本地变量(出于相同原因)。HandleEx将深度克隆该异常，中止硬件事务，并且然后重新抛出克隆的异常。在第一执行时，curtx.IsRexec()为假，因此不恢复本地变量。在给定事务实例的第二且随后的执行上，这个条件为真，并且由此每次都恢复本地变量。这是除恢复捕获语句中的本地变量之外还有的，这是由于利用longjmp的重新执行不通过捕获处理机。当收回器进入重新执行时，可在事务数据结构中记录有关应该在其中执行该重新执行的模式的判定。虽然这可在收回器中进行，但如果在此执行大量代码，则可发生堆栈溢出。另一备选是使收回器在事务数据结构中记录该判定将基于的相关数据，并在这个IsRexec()测试之后判定和安装新执行模式-表4中利用注释示出了这种可能性。 As seen in Table 4, for the reasons discussed above, assuming

The operation saves not only the stack pointer, base pointer, and instruction pointers (ESP), (EBP), and (IP), but also all callee-save registers. The IP it saves can be on the right So, like setjmp/longjmp, the operation can restart as if returning from that call. The ejector will restore the saved register value and jump to the saved IP. Note that the "naked" transformation of S is not exactly equal to S, since there may be some explicit action to commit the transaction when control flow leaves S. Because of the long-jump-based return, only the user-level exception being thrown reaches the catch statement. As in CRESTM, restore saved local variables (for the same reason). HandleEx will deep clone the exception, abort the hardware transaction, and then rethrow the cloned exception. On the first execution, curtx.IsRexec() is false, so local variables are not restored. On the second and subsequent executions of a given transaction instance, this condition is true, and thus the local variable is restored each time. This is in addition to restoring local variables in the capture statement, since re-execution with longjmp does not go through the capture handler. When the evictor enters re-execution, a determination can be recorded in the transaction data structure as to the mode in which the re-execution should be performed. While this can be done in a retractor, a stack overflow can occur if a large amount of code is executed there. Another alternative is to have the evictor record in the transaction data structure the relevant data on which this decision will be based, and after this IsRexec() test decide and install the new execution mode - this possibility is shown with comments in Table 4.

表5的伪代码中阐述了假设CRITM、CRESTM和STM模式的可能性的组合转换。 Combination transformations of the possibilities assuming CRITM, CRESTM and STM modes are illustrated in the pseudocode in Table 5.

表5table 5

关于表5的伪代码，考虑在CRITM模式中开始、遭遇竞争或资源限制、进行竞争管理判定以在CRESTM中重新执行、再次遭遇竞争或资源限制以及因此在STM中再次重新执行(这次成功了)的事务执行。 Regarding the pseudocode of Table 5, consider starting in CRITM mode, encountering a contention or resource constraint, making a contention management decision to re-execute in CRESTM, encountering a contention or resource constraint again, and thus re-executing again in STM (this time successfully ) transaction execution.

与

相关联的信息将确定事务能首先在CRITM模式中执行。在所有模式中，有效并且修改的本地变量首先被保存到堆栈帧中的映像变量(shadow variables)(如果这是最高级事务则持久这样做)。可设置该事务使得LongjmpRollback返回真，因此它将进行setjmp-equivalent。如前面所论述的，如果这是重新执行(在这个示例中不是重新执行)，则可恢复保存的本地变量。然后开始硬件事务，并执行S的STM变换的CGSTYLE_NK版本。CRITM执行损失监视或缓冲，并进入收回器，以及当前硬件事务由此被中止。 and

The associated information will determine that the transaction can be executed first in CRITM mode. In all modes, valid and modified local variables are first saved to shadow variables in the stack frame (doing so persistently if this is a top-level transaction). This transaction can be set so that LongjmpRollback returns true, so it will be setjmp-equivalent. As previously discussed, if this is a re-execution (not a re-execution in this example), the saved local variables may be restored. The hardware transaction is then started, and the CGSTYLE_NK version of the STM transform of S is executed. The CRITM performs loss monitoring or buffering, and enters the evictor, and the current hardware transaction is thereby aborted.

该事务可进行竞争管理判定，判定在CRESTM模式中重新执行。它改变事务的一些属性，包含事务vtable。它然后恢复保存的寄存器值并跳到保存的IP，由此重新开始，就好像SaveSetjmpState正好返回一样。(如之前所论述的，如果期望的话，它能在收回器中做更少的竞争管理工作，并在“IsRexec()”测试之后执行新执行模式的设置。) The transaction may be subject to contention management decisions, which are re-executed in CRESTM mode. It changes some attributes of the transaction, including the transaction vtable. It then restores the saved register values and jumps to the saved IP, thus starting over, as if SaveSetjmpState just returned. (As discussed earlier, it could do less race management in the evictor if desired, and perform the setup of the new execution mode after the "IsRexec()" test.)

开始新硬件事务，并执行代码的CGSTYLE_TV变换。在某一点，可检测监视或缓冲的损失，并产生内部ReExecuteException，由此到达异常处理机，并从它们的映像拷贝恢复本地变量。保存的本地变量值被恢复并且HandleEx被调用，这确定产生的异常是ReExecuteException。在某一点，或者更早，在产生异常之前，或者在此，进行确定下一执行模式的竞争管理判定，并且适当调整当前事务的属性。在此情况下，假设该判定是在STM模式中重新执行。由于发生重新执行，因此HandleEx返回，而不是重新产生，并且由此控制再次返回到标签L。关于这次执行，StartHWTx是无操作(no-op)，这是因为硬件加速不使用主体的CGSTYLE_TV变换，并且执行STM屏障。这次事务成功并且被提交。 A new hardware transaction is started and a CGSTYLE_TV transformation of the code is performed. At some point, a loss of monitoring or buffering may be detected and an internal ReExecuteException raised, from which the exception handler is reached and the local variables restored from their mirrored copies. The saved local variable value is restored and HandleEx is called, which determines that the exception generated is a ReExecuteException. At some point, or earlier, before the exception is raised, or at this point, a race management decision is made that determines the next mode of execution, and the attributes of the current transaction are adjusted appropriately. In this case, it is assumed that the determination is re-executed in the STM mode. Since re-execution has occurred, HandleEx returns, rather than respawning, and thus control returns to label L again. On this execution, StartHWTx is a no-op, since hardware acceleration does not use the subject's CGSTYLE_TV transform, and STM barriers are performed. The transaction succeeds and is committed.

表6在下面提供了根据本发明一实施例的TM执行的各种性质的比较。 Table 6 below provides a comparison of various properties performed by a TM according to an embodiment of the invention.

表6 Table 6

性质nature CRITMCRITM CRESTMCRES™ HASTMHASTM 将hw元数据用于过滤Use hw metadata for filtering NN NN YY 缓冲器写buffer write YY YY NN 监视器读monitor read YY YY YY 保持写日志keep a log NN NN YY 要求数据是高速缓存驻留的Requires data to be cache resident YY YY NN 兼容性compatibility CRITMCRITM CRESTM、HASTM、STMCRESTM, HASTM, STM CRESTM、HASTM、STMCRESTM, HASTM, STM 支持重试Support retry NN NN YY 支持通知Support Notice NN YY YY 转返机制switchback mechanism 基于长跳long hop based 基于异常exception based 基于异常exception based 要求TV代码生成Request TV code generation NN YY YY

由此，在各种实施例中，切换状态机可用于在多个模式中执行事务，包含隐式高速缓存驻留和显式高速缓存驻留HASTM和STM。例如，事务可开始于CRITM，并且然后在溢出时切换。此外，当某一线程进入STM锁定模式时，可切换其它线程的事务。可发生丰富语义或特征(诸如确定性提交命令或重试)上的切换模式，并且当没有STM线程剩下时，可发生回到CRITM模式的切换。 Thus, in various embodiments, a switching state machine can be used to execute transactions in multiple modes, including implicit cache-resident and explicit cache-resident HASTM and STM. For example, a transaction may start at CRITM, and then switch on overflow. In addition, when a thread enters the STM lock mode, other threads' transactions can be switched. Switching modes on rich semantics or features such as deterministic commit commands or retries can occur, and switching back to CRITM mode can occur when no STM threads remain.

实施例可使用显式监视和缓冲控制指令的UTM支持来有效地执行更小的简单事务，而没有登录和映像拷贝缓冲，同时正确地与使用软件锁定和登录规则的无约束公开和私有化正确STM/HASTM事务同时存在。由此，UTM系统可允许快速高速缓存驻留事务与STM事务(甚至对核未调度的软件线程上的那些)一起执行。在硬件隐式模式中，具体地说对于管理代码，对内部运行时间数据结构和堆栈的访问可以不必添加到高速缓存管理事务读和写集合中。在CRESTM的非隐式模式中，使用(通过软件)高速缓存监视和缓冲指令设施，事务可仅监视和缓冲需要事务语义的用户数据。堆栈、在逗留(sojourn)在CLR帮助代码(helper code)中期间发生的数据访问或CLR运行时间自身不使用高速缓存监视和缓冲，并且由此它们自己不作用于基于收回(容量未命中)高速缓存驻留事务中止。 Embodiments may use UTM support for explicit monitoring and buffering control instructions to efficiently perform smaller simple transactions without login and image copy buffering, while correctly aligning with unconstrained disclosure and privatization using software locking and login rules STM/HASTM transactions exist at the same time. Thus, the UTM system can allow fast cache-resident transactions to execute alongside STM transactions (even those on software threads that are not scheduled to the core). In the hardware implicit mode, accesses to internal run-time data structures and stacks may not necessarily be added to cache management transaction read and write sets, specifically for managed code. In CRESTM's non-implicit mode, using (by software) cache monitoring and buffering instruction facilities, transactions can monitor and buffer only user data that requires transactional semantics. The stack, data accesses that occur during sojourn in CLR helper code, or the CLR runtime itself do not use cache monitoring and buffering, and thus do not themselves act on eviction-based (capacity miss) caches. Cache resident transaction aborted.

如上所述，在由于高速缓存容量或使用未用硬件实现的语义而后退到HASTM或STM之前，事务在各种硬件加速模式(像CRITM和CRESTM)中执行。对于CRESTM，提供高速缓存驻留STM遵守显式事务存储器模式，其可与STM事务和CRITM事务互操作。然后，当对于一个事务发生到STM的后退时，其它的可切换到CRESTM，但不是所有的事务都必须一直到最无效的STM模式。类似地，升级可逐渐发生，其中第一STM事务完成而系统的其余部分在CRESTM模式中工作，然后CRESTM事务完成而系统的其余部分已经在最有效的CRITM模式中。 As mentioned above, transactions are executed in various hardware accelerated modes like CRITM and CRESTM before falling back to HASTM or STM due to cache capacity or using semantics not implemented in hardware. For CRESTM, a cache-resident STM is provided that adheres to an explicit transactional memory model, which is interoperable with both STM transactions and CRITM transactions. Then, when a fallback to STM occurs for one transaction, the others can switch to CRESTM, but not all transactions necessarily go all the way to the most ineffective STM mode. Similarly, upgrades can occur gradually, with the first STM transaction completing while the rest of the system is working in CRESTM mode, and then the CRESTM transaction completing while the rest of the system is already in the most efficient CRITM mode.

在根据本发明的实施例使用加速存储器屏障的情况下，可通过消除写日志的开销、消除对于硬件事务从全局池分配时间戳的需要、增大CRESTM事务之中和CRESTM事务与STM事务之间的并发并对CRESTM事务与STM事务之间的竞争适应性地反应来改进执行特性。 In the case of using an accelerated memory barrier according to an embodiment of the present invention, by eliminating the overhead of writing logs, eliminating the need for hardware transactions to allocate timestamps from the global pool, increasing the number of transactions between CRESTM transactions and between CRESTM transactions and STM transactions Concurrency and adaptive response to competition between CRESTM transactions and STM transactions to improve execution characteristics.

对象标题(OH)可用在CRITM和CRESTM事务模式内。因为不是OH的所有使用都能切换到TM，并且由于硬件不能支持开放嵌套事务，所以这些模式可与系统其它部分使用的OH上的比较和保存(CAS)协议交互。对OH的某些改变必须是持久的。在这方面，哈希代码(HC)是最值得注意的。对于对象的稳定HC的要求还暗示对SyncBlockIndex(SBI)的改变也是持久的。对于CRESTM，不需要抑制和重新进入，这是因为事务不会使用事务读或写访问SyncBlock管理数据结构。在事务内部创建的对象不是全局可见的，因此，也可缓冲对它们的标题的修改。 Object headers (OH) can be used within the CRITM and CRESTM transaction modes. Because not all uses of OH can be switched to TM, and since the hardware cannot support open nested transactions, these modes can interact with the Compare and Save (CAS) protocol on OH used by other parts of the system. Certain changes to OH must be persistent. In this regard, hash codes (HC) are the most notable. The requirement for a stable HC of the object also implies that changes to the SyncBlockIndex (SBI) are also persistent. For CRESTM, suppression and re-entrancy are not required because transactions do not use transactional read or write access to SyncBlock management data structures. Objects created inside a transaction are not globally visible, therefore, modifications to their titles are also buffered.

在全局版本时钟系统中，CRESTM与STM的互操作性提供了锁遵守硬件模式。进行如下首要假设。全局版本时钟方案用于提供公开正确性，某种形式的提交票据协议(commit ticket protocol)和缓冲写用于提供私有化正确性，在遭遇时间(例如经由OpenForWrite()函数)获得写锁，并在取得提交票据后、在提交时间验证有利的读。 The interoperability of CRESTM with STM provides a lock compliance hardware model in the global version clock system. The following primary assumptions are made. A global version clock scheme is used to provide public correctness, some form of commit ticket protocol and buffered writes are used to provide private correctness, a write lock is acquired at encounter time (e.g. via the OpenForWrite() function), and Favorable reads are verified at commit time after the commit ticket has been taken.

通过让硬件事务保持写日志并在提交的恰当阶段期间用写变量(WV)更新元数据(例如事务记录或事务元数据字(TMW))，可实现全局版本时钟集成。硬件算法如下：开始硬件事务，并在写之前执行写屏障。在一个实施例中，对于o.tmw=“被我锁定”，这个屏障可以是缓冲写，并且对象的地址被登录到事务本地写日志中，并且TMW被监视。在每次读之前执行读屏障，其中检查锁定位，并且如果存在锁(除非“被我锁定”)则中止事务，并TMW可被监视。在该主体完成之后，可使用抑制区域中的逻辑获得这个事务的WV。然后，可使用所写地址的列表来通过缓冲写对于WV更新每个o.tmw，并且提交硬件事务。由此，在已经获得了所有写锁之后，获得WV。在硬件模式中，“获得写锁”意味着在恰当TMW上存在监视。 Global version clock integration can be achieved by having hardware transactions keep a write log and update metadata (such as transaction records or transaction metadata words (TMW)) with write variables (WV) during the appropriate phase of commit. The hardware algorithm is as follows: start a hardware transaction, and perform a write barrier before writing. In one embodiment, for o.tmw="locked by me", this barrier may be a buffered write, and the object's address is logged in the transaction local write log, and the TMW is monitored. A read barrier is performed before each read where the lock bit is checked and the transaction is aborted if there is a lock (unless "locked by me") and TMW can be monitored. After the body is complete, the WV for this transaction can be obtained using logic in the suppression area. Then, each o.tmw can be updated for WV with buffered writes using the list of written addresses, and the hardware transaction committed. Thus, after all write locks have been acquired, WV is acquired. In hardware mode, "obtaining a write lock" means that there is a watch on the appropriate TMW.

这个方案可能由于需要保持写日志而具有差性能。在一些实施例中，会有可能是锁遵守的，而没有写日志，并由此可消除保持写日志的需要。可使用两个假设进行全局版本编号实现的优化。第一，假设将存在远远多于STM事务的CRESTM事务；并且第二，假设实际数据冲突是罕见的。第一假设由如下事实激发：一个事务后退到STM中并未迫使其它事务移动到STM中。预计后退到STM中将是罕见的，并且由此“受害者”将是单独的事务，而其它事务继续在CRESTM中执行。在充分平行的系统中，这意味着将存在比STM事务多许多的CRESTM事务。第二假设是与工作负荷相关的，但一般是良好设计的品质证明，并由此在成功程序中占优势。 This scheme may have poor performance due to the need to keep a write log. In some embodiments, it may be possible for locks to be observed without a write log, and thus the need to keep a write log may be eliminated. The optimization of the global version numbering implementation can be made using two assumptions. First, assume that there will be far more CRESTM transactions than STM transactions; and second, assume that actual data collisions are rare. The first assumption is motivated by the fact that rolling back one transaction into STM does not force other transactions to move into STM. Fallbacks into STM are expected to be rare, and thus "victims" will be individual transactions while others continue to execute in CRESTM. In a sufficiently parallel system, this means that there will be many more CRESTM transactions than STM transactions. The second assumption is workload dependent, but generally a quality proof of good design and thus predominates in successful programs.

CRESTM事务使用公共版本号，由硬件全局版本号(HGV)表示，以对它正在修改的任何对象加印戳(stamp)。STM事务保证HGV严格大于基于软件的全局版本号(GV)，使得由并发CRESTM事务进行的任何“写”正确地作为冲突出现。可成批增大HGV，使得只要没有数据冲突发生就保证最大并发。通过退化到最基本策略并然后再次逐渐在更积极的路径上重新开始来处理数据冲突。 A CRESTM transaction uses a public version number, represented by a hardware global version number (HGV), to stamp any object it is modifying. STM transactions guarantee that HGV is strictly greater than the software-based global version number (GV), so that any "writes" made by concurrent CRESTM transactions correctly appear as conflicts. The HGV can be increased in batches so that maximum concurrency is guaranteed as long as no data conflicts occur. Data conflicts are handled by regressing to the most basic strategy and then gradually restarting again on a more aggressive path.

为了在没有写日志的情况下是锁遵守的，如下操作可发生在硬件事务中。假设GV和HGV都开始于0。每个硬件事务可首先设置印戳值(stamp value)(SV)=HGV。用监视读HGV，因此对它的任何写都将毁灭所有硬件事务。在写、例如对于o.tmw=SV使用缓冲写之前可执行写屏障，并监视TMW。在每次读之前可执行读屏障，其中检查锁定位，并且如果存在锁则中止该函数，监视TMW，并且事务用票据协议提交。由此，对于硬件事务，不保持暂时改变的对象的日志；相反，用HGV对暂时改变的对象加印戳；并且如果事务提交，则时间戳与数据改变一起变成永久性的。 In order to be lock-observing without writing to the log, the following operations can occur in hardware transactions. Assume both GV and HGV start at 0. Each hardware transaction may first set a stamp value (SV)=HGV. HGV is read with watch, so any write to it will destroy all hardware transactions. A write barrier may be performed before writing, eg using buffered writing for o.tmw=SV, and monitoring TMW. A read barrier may be performed before each read, where the lock bit is checked and the function is aborted if a lock exists, the TMW is monitored, and the transaction commits with the ticket protocol. Thus, for hardware transactions, no log of temporarily changed objects is kept; instead, temporarily changed objects are stamped with HGV; and if the transaction commits, the timestamp becomes permanent along with the data change.

每个软件事务可设置读变量(RV)=GV。如果(HGV<RV+1)比较并设置(CAS)(HGV, RV/*预计的*/,GV+l/*新的*/)。现在，硬件正在加印戳到未来，并且所有当前硬件事务被毁灭。当事务准备提交时，事务执行对于STM是常规的，例如在遭遇时间获得的锁等。写变量(WV)设置成使得它在递增之后等于GV。到GV的递增确保：如果这个事务转返并且然后重新执行，则任何未完成硬件事务被毁灭并加印戳到未来。使用RV验证读集合，并且然后使用WV释放所有写锁。不保持写锁，但不利方面是，每当在另一个软件事务已经完成(用于提交或转返)后软件事务开始时，则所有硬件事务都毁灭。这个行为可通过使HGV一次前进多于1来缓解。如果它例如前进10，则在看到某一其它软件事务完成之后，在所有硬件事务都毁灭之前，又有10个软件事务可开始。 Each software transaction may set a read variable (RV) = GV. If (HGV<RV+1) compare and set (CAS)(HGV, RV /*predicted*/, GV+l/*new*/). Now, the hardware is being stamped into the future, and all current hardware affairs are destroyed. When a transaction is ready to commit, transaction execution is normal to STM, such as locks acquired at encounter time, etc. The write variable (WV) is set such that it is equal to GV after incrementing. The increment to the GV ensures that if this transaction is rolled back and then re-executed, any outstanding hardware transactions are destroyed and stamped into the future. Use RV to validate read sets, and then use WV to release all write locks. Write locks are not held, but the downside is that every time a software transaction starts after another software transaction has completed (for commit or rollback), then all hardware transactions are destroyed. This behavior can be mitigated by making the HGV advance more than 1 at a time. If it advances by 10, for example, after seeing some other software transaction complete, 10 more software transactions can start before all hardware transactions are destroyed.

由此，当软件事务开始时，它对GV采样，并将检索值存储在本地变量RV中。它继续如程序所规定的那样实现读和写。当事务准备提交时，首先检查GV以确定增量是否将使GV达到HGV的值。如果是，则将HGV递增量B(将在下面论述它的值)。 Thus, when a software transaction starts, it samples GV and stores the retrieved value in local variable RV. It continues to read and write as specified by the program. When a transaction is ready to commit, GV is first checked to determine whether the increment will bring GV up to the value of HGV. If so, the HGV is incremented by an amount of B (its value will be discussed below).

这些规则提供了确保总是检测到冲突所必需的安全性。一般而言，冲突检测可发生如下：CRESTM对CRESTM冲突在硬件级被检测为原始数据访问上的冲突；CRESTM对CRITM冲突在硬件级也被检测为原始数据访问上的冲突；碰巧招致对CRESTM事务当前监视和/或缓冲的对象的冲突访问的STM事务将引起CRESTM事务转返；使STM事务访问的数据无效的CRESTM事务将由STM事务检测到，不晚于STM验证阶段期间，这是因为对象上加印戳的HGV将必需大于在STM事务开始时探测的GV。 These rules provide the security necessary to ensure that conflicts are always detected. In general, conflict detection can occur as follows: CRESTM-to-CRESTM conflicts are detected at the hardware level as conflicts on raw data accesses; CRESTM-to-CRITM conflicts are also detected at the hardware level as conflicts on raw data accesses; STM transactions with conflicting accesses to currently monitored and/or buffered objects will cause CRESTM transactions to roll back; CRESTM transactions that invalidate data accessed by the STM transaction will be detected by the STM transaction no later than during the STM validation phase, because The stamped HGV will have to be greater than the GV probed at the start of the STM transaction.

如上面所描述的，B的值或“批量大小”是允许HGV偏离GV的量。如上面所提到的，无论何时GV达到HGV的值，HGV都递增B。这无论何时发生，所有当前执行的CRESTM事务都转返，这是因为它们正在监视HGV的位置。由此，B越大，这种无效发生得越不频繁。另一方面，一旦STM事务观察到对象的版本号高于当前GV，它就将不得不将GV前进到那个较高号，以便确保在其下一重新执行时它将能够成功地读对象。如果B大，则这种通过版本空间的“跳跃”可引起版本空间消耗更快，并且这对于版本空间受限并且回绕成本大(例如它可能需要对堆中的所有对象重新编号)的系统可能是关注事项。 As described above, the value of B or "batch size" is the amount by which the HGV is allowed to deviate from the GV. As mentioned above, whenever GV reaches the value of HGV, HGV is incremented by B. Whenever this happens, all currently executing CRES™ transactions roll back because they are monitoring the HGV's location. Thus, the larger B is, the less frequently such invalidations occur. On the other hand, once an STM transaction observes an object with a version number higher than the current GV, it will have to advance the GV to that higher number in order to ensure that it will be able to successfully read the object on its next re-execution. This "jumping" through the version space can cause version space consumption to be faster if B is large, and this may be possible for systems where the version space is limited and wraparound costs are large (e.g. it may require renumbering all objects in the heap) is a matter of concern.

实施例可将B值调节成使得只要不同事务访问的数据不相交，就允许B是大的，但只要检测到共享，就减小B的值。这个机制中的第一因素是有效的检测方法。也就是说，STM事务需要能够以高概率辨别它确实读了具有“高”HGV号的硬件事务所产生的值。为了实现此方面，该事务将对象含有的版本与GV相比较。如果对象的版本较高，则用HGV对对象加印戳。在事务观察版本号高于当前GV的对象的任何情况下，该事务都将GV至少前进到它看到的版本号。 Embodiments may adjust the value of B such that B is allowed to be large as long as the data accessed by different transactions is disjoint, but the value of B is decreased whenever sharing is detected. The first factor in this mechanism is an effective detection method. That is, an STM transaction needs to be able to tell with high probability that it has indeed read a value produced by a hardware transaction with a "high" HGV number. To accomplish this, the transaction compares the version the object contains with the GV. If the object has a higher version, the object is stamped with HGV. In any case where a transaction observes an object with a version number higher than the current GV, the transaction advances the GV to at least the version number it sees.

冲突情形一被处理，就减小B的值以确保这种情形的再次发生在版本空间消耗方面成本较低(不过，对于具有非常大版本空间的系统，这可能关系不太大)。允许“快速收缩/缓慢生长”的任何策略都是可接受的。例如，无论何时检测到冲突情形，B的值都减半，但绝不会使之小于1，并且无论何时将HGV增大B，B的值也递增，但是递增固定量，例如1，并且仅最高达预定帽值(cap value)。 As soon as a conflict situation is handled, reducing the value of B to ensure that this situation occurs again is cheap in terms of version space consumption (though, for systems with very large version spaces, this may not matter too much). Any strategy that allows for "fast shrinkage/slow growth" is acceptable. For example, whenever a collision situation is detected, the value of B is halved, but never less than 1, and whenever HGV is increased by B, the value of B is also incremented, but by a fixed amount, say 1, And only up to a predetermined cap value.

现在参考图5，示出了根据本发明一实施例用于处理硬件和软件事务的方法的流程图。如图5所示，方法500可包含用于硬件事务和软件事务的代码路径。首先，关于硬件事务，印戳值可设置成使得等于HGV(块510)。此外，对于这个印戳值的位置，监视可设置成使得硬件事务得到关于是否更新HGV的通知(块515)。在执行硬件事务期间，可执行各种读和写操作，其中每个都可使用对应的读或写屏障实现(块520)。对于每个此类屏障，可确定屏障操作是否失败，例如由于要读或写的位置上存在锁(菱形块523)。如果是，则可中止事务(块535)。如果对于写操作给定屏障操作成功了，则可在缓冲器中更新写数据，并且该数据可与当前HGV相关联(块525)。在事务结束时，可以确定HGV是否已经改变(菱形块530)。如果是，则这指示例如在这个硬件事务与软件事务之间冲突已经发生，并且相应地可中止硬件事务(块535)。否则，控制传到块540，在此可提交更新，使得每个更新的值可被存储到存储器，并且可与HGV相关联以指示它被更新时的版本号。 Referring now to FIG. 5 , there is shown a flowchart of a method for processing hardware and software transactions in accordance with an embodiment of the present invention. As shown in FIG. 5, method 500 may include code paths for hardware transactions and software transactions. First, for a hardware transaction, the stamp value may be set such that it is equal to the HGV (block 510). Also, for the location of this stamp value, monitoring may be set such that the hardware transaction is notified whether to update the HGV (block 515). During execution of a hardware transaction, various read and write operations may be performed, each of which may be implemented using a corresponding read or write barrier (block 520). For each such barrier, it may be determined whether the barrier operation failed, for example due to a lock on the location to be read or written (diamond 523). If so, the transaction can be aborted (block 535). If a given barrier operation is successful for a write operation, the write data may be updated in the buffer and associated with the current HGV (block 525). At the end of the transaction, it may be determined whether the HGV has changed (diamond 530). If so, this indicates that, for example, a conflict has occurred between this hardware transaction and the software transaction, and the hardware transaction may be aborted accordingly (block 535). Otherwise, control passes to block 540 where the updates can be committed so that each updated value can be stored to memory and can be associated with the HGV to indicate the version number when it was updated.

对于软件事务，在启动时，可设置对应于当前GVN的读值(块550)。然后，可以确定递增这个读值是否将引起结果大于HGV的当前值(菱形块555)。如果是，则可更新HGV，这引起所有未决硬件事务都被毁灭。更具体地说，控制从菱形块555传到块560，在此HGV值可通过适应性的批量大小B更新。注意，可使用原子比较和交换指令用硬件以原子方式执行菱形块555和块560的操作。从菱形块555或块560中的任一个，控制传到块565，在此可这些STM事务。一般来说，可使用软件锁获得任何所写值的所有权来读各种数据，执行操作以及更新值。在这种操作结束时，可以确定事务是否准备提交(菱形块570)。如果是，则控制传到块575，在此递增GVN(块575)。在一个实施例中，这个增量可能是1。这个更新的GVN可存储在与软件事务相关联的写值中(块580)。注意，可例如使用将GVN的新值返回到写值中的原子递增指令用硬件以原子方式执行块575和580的操作。然后，可以确定是否所有读对象都具有小于或等于读值的版本号(菱形块585)。如果否，则可中止事务(块595)。如果菱形块585的确认反而成功了，则控制传到块590，在此事务被提交，并且写值可用作写集合中所有对象的新版本号。换句话说，通过给予写集合中对象等于WV的新版本号来释放写集合中对象的写锁。虽然示出在图5的实施例中用这个具体实现示出，但是本发明的范围不限于这个方面。 For software transactions, at startup, a read value corresponding to the current GVN may be set (block 550). It can then be determined whether incrementing this read value would result in a result greater than the current value of HGV (diamond 555). If so, the HGV can be updated, which causes all pending hardware transactions to be destroyed. More specifically, control passes from diamond 555 to block 560 where the HGV values can be updated with an adaptive batch size B. Note that the operations of diamond block 555 and block 560 may be performed atomically in hardware using atomic compare and swap instructions. From either diamond 555 or block 560, control passes to block 565 where these STM transactions are executed. In general, dongles can be used to take ownership of any value written to read various data, perform operations, and update values. At the conclusion of such operations, it may be determined whether the transaction is ready to commit (diamond 570). If so, control passes to block 575 where GVN is incremented (block 575). In one embodiment, this increment may be one. This updated GVN may be stored in a write value associated with the software transaction (block 580). Note that the operations of blocks 575 and 580 may be performed atomically in hardware, eg, using an atomic increment instruction that returns the new value of GVN into the write value. It can then be determined whether all read objects have version numbers less than or equal to the read value (diamond 585). If not, the transaction may be aborted (block 595). If the confirmation of diamond 585 succeeds instead, then control passes to block 590 where the transaction is committed and the write value can be used as a new version number for all objects in the write set. In other words, release the write lock on the object in the write set by giving the object in the write set a new version number equal to WV. Although shown in the embodiment of FIG. 5 with this particular implementation, the scope of the invention is not limited in this respect.

代码生成解决(resolve to)两个最最不相关的判定，它们一起导致事务执行模式。第一，代码生成风格可使用裸(NK)模式或事务VTable(TV)完成。第二，对于转返机制，当进行判定以重新执行事务时，可以确定进行的修改如何转返以及控制如何转到事务的开头。 Code generation resolves to the two most unrelated decisions that together lead to the transactional execution mode. First, code generation style can be done using Naked (NK) schema or Transactional VTable (TV). Second, for the rollback mechanism, when a decision is made to re-execute a transaction, it can be determined how the modifications made roll back and how control goes to the beginning of the transaction.

对于代码生成风格，可用称为事务vtable的子结构扩大(有可能由相同顺序嵌套的成员共享的)事务上下文结构。这是其元素是函数指针的结构，STM模式的STM JIT帮助器的每个种类一个。可创建其它模式，使得通过动态改变事务vtable，相同TV生成的代码可用于多个模式。 For the code generation style, the transaction context structure (possibly shared by members nested in the same order) can be augmented with a substructure called a transaction vtable. This is a structure whose elements are function pointers, one for each kind of STM JIT helper for STM mode. Other schemas can be created such that code generated by the same TV can be used in multiple schemas by dynamically changing the transaction vtable.

当事务检测到不一致性或明确中止时，所有状态改变都转返，并且控制返回到事务的开头。CRESTM和纯软件基于异常的机制产生内部异常以完成转返。这个异常不能由除了作为在代码生成期间事务转换一部分插入的处理机外的任何处理机捕获。 When a transaction detects inconsistency or is explicitly aborted, all state changes are rolled back and control is returned to the beginning of the transaction. CRESTM and pure software exception-based mechanisms generate internal exceptions to complete the return. This exception cannot be caught by any handler other than one inserted as part of a transaction transition during code generation.

可发生事务嵌套。首先提供对闭合嵌套事务的论述，并且在开放嵌套事务的上下文中描述抑制行为，原因在于这些概念是相关的。给定硬件架构可能不支持任何形式的嵌套。相反，可支持平坦模型，其中接触的高速缓存行被缓冲和监视，并且可以原子方式提交给存储器，或者转返，其中它们的临时效应随着没有跟踪(trace)而消失。然而，嵌套事务的失败原子性规定，如果嵌套事务转返，则仅其效应被撤销，并且保留母事务的效应(然而仍仅是暂时性的)。 Transaction nesting can occur. A discussion of closed nested transactions is provided first, and suppression behavior is described in the context of open nested transactions because these concepts are related. A given hardware architecture may not support any form of nesting. Instead, a flat model can be supported, where touched cache lines are buffered and monitored, and can be atomically committed to memory, or rolled back, where their temporary effects disappear without trace. However, the atomicity of failure of nested transactions provides that if the nested transaction rolls back, only its effects are undone, and the effects of the parent transaction are preserved (still only temporarily, however).

平坦化(flattening)是假设事务不可能转返并且因此没发生嵌套撤销信息收集的有利技术。一般算法如下。当进入嵌套原子块时，设置嵌套闭合事务，并且尝试/捕获块放在其执行周围。如果嵌套事务提交(这是常见情况)，则重新开始母的执行，并且子的效应现在被包含到母中，并且将绝不引起对有选择地撤销它们的需要。如果另一方面嵌套事务代码渗滤异常，则在具有真实嵌套支持的系统中仅嵌套事务将转返，并且异常将在母的上下文中重新浮出水面(resurface)。在其中不能独立转返子事务的实现中，可转返整个事务嵌套，并在支持真实嵌套的模式中重新执行整个事务嵌套。 Flattening is an advantageous technique that assumes that transaction rollback is impossible and therefore nested undo information collection does not occur. The general algorithm is as follows. When a nested atomic block is entered, the nested closure transaction is set, and a try/catch block is placed around its execution. If nested transactions commit (which is the common case), the execution of the parent is restarted, and the effects of the children are now contained in the parent, and will never cause the need to undo them selectively. If on the other hand the nested transaction code percolates the exception, in a system with real nesting support only the nested transaction will roll back and the exception will resurface in the parent's context. In implementations where subtransactions cannot be rolled back independently, the entire transaction nest may be rolled back and re-executed in a mode that supports true nesting.

类似于其中发生高速缓存驻留事务转返的其它情况，可采用如下机制。在进行嵌套的毁灭确定的点，可对事务上下文进行持久写，实质上阐述事务为什么转返以及接下来需要什么种类的重新执行模式。然后，执行堆栈转返，并且例如可使用一般异常(normal exception)执行包围整个嵌套的封入异常处理机(enclosing exception handler)。通过在HASTM模式中重新执行，可发生从平坦化失败中的恢复。 Similar to other cases where cache resident transaction rollback occurs, the following mechanism may be employed. At the point of nested destruction determination, a persistent write can be made to the transaction context, essentially stating why the transaction rolled back and what kind of re-execution mode is required next. Then, a stack return is performed and an enclosing exception handler enclosing the entire nest may be executed, for example using a normal exception. Recovery from flattening failures can occur by re-executing in HASTM mode.

概括地说，CRESTM允许在没有锁定或登录的情况下小而简单的事务运行，甚至在存在其它无约束STM事务时，从而全面地提供快速而丰富的全特征有限微弱原子正确(limited-weakly-atomic-correct)TM编程模型。使用显式模式事务允许软件优化其对私有高速缓存的UTM事务设施的先前受限状态的使用，并由此在溢出到STM之前运行更长且更大的事务。例如，堆栈访问以及新近分配的对象不需要监视或缓冲。实施例提供了有效高速缓存驻留模式以加速有限微弱原子正确实现(缓冲、登录、锁定)的最大开销。在各种实施例中，软件指令可显式地仅对某些用户程序数据访问进行事务处理(transact)。 In a nutshell, CRESTM allows small and simple transactions to run without locking or logging, even in the presence of otherwise unconstrained STM transactions, thereby comprehensively providing fast and rich full-featured limited-weakly-atomically correct (limited-weakly- atomic-correct)TM programming model. Using explicit mode transactions allows software to optimize its use of the previously limited state of the privately cached UTM transaction facility, and thereby run longer and larger transactions before spilling over to the STM. For example, stack accesses and newly allocated objects do not require monitoring or buffering. Embodiments provide an efficient cache resident pattern to speed up the maximum overhead of bounded weak atomic correct implementations (buffering, logging, locking). In various embodiments, software instructions may explicitly transact only certain user program data accesses.

可用许多不同系统类型实现实施例。现在参考图6，示出了根据本发明一个实施例的系统的框图。如图6所示，多处理器系统1000是点对点互连系统，并包含经由点对点互连1050耦合的第一处理器1070和第二处理器1080。如图6所示，处理器1070和1080中的每一个可以是多核处理器，包含第一处理器核和第二处理器核(即处理器核1074a和1074b以及处理器核1084a和1084b)，然而在处理器中可存在可能多许多的核。处理器核可使用硬件、软件或其组合执行TM事务以实现有效的无约束事务。 Embodiments can be implemented with many different system types. Referring now to FIG. 6, a block diagram of a system according to one embodiment of the present invention is shown. As shown in FIG. 6 , multiprocessor system 1000 is a point-to-point interconnect system and includes a first processor 1070 and a second processor 1080 coupled via a point-to-point interconnect 1050 . As shown in FIG. 6, each of the processors 1070 and 1080 may be a multi-core processor comprising a first processor core and a second processor core (ie, processor cores 1074a and 1074b and processor cores 1084a and 1084b), There may however be potentially many more cores in a processor. Processor cores may execute TM transactions using hardware, software, or a combination thereof to enable efficient unconstrained transactions.

仍参考图6，第一处理器1070还包含存储控制器中心(MCH)1072和点对点(P-P)接口1076和1078。类似地，第二处理器1080包含MCH 1082和P-P接口1086和1088。如图6所示，MCH 1072和1082将处理器耦合到相应的存储器，即存储器1032和存储器1034，它们可以是本地附连到相应处理器的主存储器(例如动态随机存取存储器(DRAM))的部分。第一处理器1070和第二处理器1080可分别经由P-P互连1052和1054耦合到芯片集1090。如图6中所示，芯片集1090包含P-P接口1094和1098。 Still referring to FIG. 6 , the first processor 1070 also includes a memory controller hub (MCH) 1072 and point-to-point (P-P) interfaces 1076 and 1078 . Similarly, second processor 1080 includes MCH 1082 and P-P interfaces 1086 and 1088. As shown in FIG. 6, MCHs 1072 and 1082 couple the processors to respective memories, memory 1032 and memory 1034, which may be main memory (e.g., dynamic random access memory (DRAM)) locally attached to the respective processors. part. First processor 1070 and second processor 1080 may be coupled to chipset 1090 via P-P interconnects 1052 and 1054, respectively. As shown in FIG. 6 , chipset 1090 includes P-P interfaces 1094 and 1098 .

此外，芯片集1090包含接口1092以通过P-P互连1039将芯片集1090与高性能图形引擎1038耦合。芯片集1090又可经由接口1096耦合到第一总线1016。如图6中所示，各种输入/输出(I/O)装置1014可耦合到第一总线1016，连同将第一总线1016耦合到第二总线1020的总线桥1018。各种装置可耦合到第二总线1020，在一个实施例中例如包含键盘/鼠标1022、通信装置1026和数据存储器位置1028，诸如盘驱动器或可包含代码1030的其它大容量存储装置。另外，音频I/O 1024可耦合到第二总线1020。 Additionally, chipset 1090 includes interface 1092 to couple chipset 1090 with high performance graphics engine 1038 through P-P interconnect 1039 . Chipset 1090 may in turn be coupled to first bus 1016 via interface 1096 . As shown in FIG. 6 , various input/output (I/O) devices 1014 may be coupled to a first bus 1016 , along with a bus bridge 1018 coupling the first bus 1016 to a second bus 1020 . Various devices may be coupled to the second bus 1020 including, for example, a keyboard/mouse 1022 , communication devices 1026 , and data storage locations 1028 in one embodiment, such as disk drives or other mass storage devices that may contain code 1030 . Additionally, audio I/O 1024 may be coupled to second bus 1020.

实施例可以用代码来实现，并可存储在其上存储有指令的存储介质上，其可用于对系统编程以执行这些指令。存储介质可包含但不限于任何类型的盘，包括软盘、光盘、光盘、固态驱动器(SSD)、压缩盘只读存储器(CD-ROM)、压缩盘可重写(CD-RW)和磁光盘、半导体器件、诸如只读存储器(ROM)、随机存取存储器(RAM)、诸如动态随机存取存储器(DRAM)、静态随机存取存储器(SRAM)、可擦除可编程只读存储器(EPROM)、闪速存储器、电可擦除可编程只读存储器(EEPROM)、磁卡或光卡、或适合于存储电子指令的任何其它类型的介质。 Embodiments can be implemented in code and stored on a storage medium having stored thereon instructions that can be used to program a system to carry out the instructions. The storage medium may include, but is not limited to, any type of disk, including floppy disks, compact disks, optical disks, solid-state drives (SSD), compact disk read-only memory (CD-ROM), compact disk rewritable (CD-RW), and magneto-optical disks, Semiconductor devices such as Read Only Memory (ROM), Random Access Memory (RAM), such as Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Erasable Programmable Read Only Memory (EPROM), Flash memory, electrically erasable programmable read-only memory (EEPROM), magnetic or optical cards, or any other type of medium suitable for storing electronic instructions.

虽然已经相对于有限数量的实施例公开了本发明，但本领域技术人员将理解对其的各种修改和变化。意图是，所附权利要求书包含落入本发明的真实精神和范围内的所有此类修改和变型。 While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art will appreciate various modifications and changes thereto. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of the invention.

Claims

1. A method comprising:

Selecting a first transactional execution mode to begin a first transaction in an unconstrained transactional memory (UTM) system having multiple transactional execution modes comprising a plurality of hardware for execution within a cache memory of a processor mode, at least one hardware-assisted mode for execution using transactional hardware and software buffers of said processor, and at least one software-transactional memory (STM) mode for execution without said transactional hardware, wherein if no pending transactions are executing in said at least one STM mode, said first transaction execution mode is selected as the highest performing of said plurality of hardware modes, and otherwise said first transaction execution mode is selected as said multiple hardware modes The mode with lower performance among the two hardware modes;

begin executing the first transaction in the first transaction execution mode;

committing the first transaction if the first transaction completes without overflowing the cache memory or violating transactional semantics; and

Otherwise, the first transaction is aborted, and a new transaction execution mode is selected to execute the first transaction.

2. The method of claim 1, wherein the highest performing mode is an implicit mode in which no locking or versioning operations take place.

3. The method of claim 2, wherein the lower performance mode is an explicit mode in which locking operations occur.

4. The method of claim 1 , further comprising: selecting the new transaction execution mode as one of the at least one hardware-assisted mode if the first transaction overflows the cache memory.

5. The method of claim 1 , further comprising: while executing the first transaction in the highest performance mode, determining that a second transaction has started in the at least one STM mode, and returning to the The first transaction is re-executed in the lower performance mode, one of the plurality of hardware modes.

6. The method of claim 1 , further comprising aborting the first transaction in the first transaction execution mode if a loss of hardware properties of the first transaction occurs.

7. An article of manufacture comprising a machine-accessible storage medium containing instructions that, when executed, cause a system to:

receiving an indication of a failure of a first transaction executing in a first transactional execution mode, the first transactional execution mode, one of a plurality of transactional execution modes, the plurality of transactional execution modes including a multiple hardware modes for execution within the processor, at least one hardware-assisted mode for execution using the transactional hardware and software buffers of the processor, and at least one software transactional memory (STM) for execution without the transactional hardware )model;

determining why the first transaction failed; and

Determining a new transaction execution mode for re-executing the first transaction based at least in part on the failure reason includes: determining whether the first transaction should be re-executed in the first transaction execution mode, and the second transaction execution mode has a lower performance level than said first transactional execution mode, or a third transactional execution mode has a higher performance level than said first transactional execution mode.

8. The article of claim 7 , wherein determining the cause of failure comprises accessing a transaction record received with the indication, the transaction record containing a transaction status register (TSR) at the point of the first transaction failure ) value.

9. The article of claim 8, further comprising instructions to enable the system to perform a persistent write to the transaction record to indicate the new transaction execution mode in which the first transaction is executed.

10. The article of claim 7 , further comprising enabling the system to perform just-in-time (JIT) compilation of a code block of the first transaction in response to the first transaction failure and after the JIT compilation in Instructions of the first transaction are re-executed in the first transaction execution mode.

11. The article of claim 7, further comprising enabling the system to re-execute the first transaction in the first transaction execution mode if the counter of the number of times the first transaction has failed is less than a threshold instruction.

12. The article of claim 7, further comprising enabling the system to re-execute the first transaction in a second transaction mode if the first transaction fails due to a change in the number of pending transactions operating in STM mode An instruction for a transaction.

13. The article of claim 12 , further comprising enabling the system to continue executing the first transaction in the first transaction execution mode and refraining from initiating transactions to execute in the at least one second transaction mode. Instructions for at least one other transaction predetermined period.

14. A method comprising:

concurrently executing a first transaction using a hardware transaction execution mode and a second transaction using a software transaction execution mode; and

Within the second transaction, determine whether incrementing the read value of the second transaction corresponding to the global version counter used in the second transaction would cause the read value to exceed the value used in the first transaction A hardware global version counter, and if so, updating said hardware global version counter with an adaptive batch size.

15. The method of claim 14 , further comprising: monitoring a stamp value corresponding to the hardware global version counter in the first transaction, and if the value of the hardware global version counter changes, aborting the A first transaction wherein said hardware global version counter change is detected by said monitoring.

16. The method of claim 14 , further comprising: performing the first transaction without maintaining a write log.

17. The method of claim 14, wherein the hardware global version counter is strictly greater than the global version counter.

18. The method of claim 17, further comprising reducing the adaptive batch size in response to a conflict between the first transaction and the second transaction.

19. The method of claim 18, further comprising adjusting the adaptive batch size in response to incrementing the hardware global version counter to the adaptive batch size.

20. The method of claim 19 , wherein the adaptive batch size is increased by a first amount, and the adaptive batch size is decreased by a second amount greater than the first amount quantity.