Appendix A: Historical Rationale for Extensions
This appendix contains the rationale for RISC-V ISA extensions at the time they were ratified. Unlike the ISA specification, this appendix is ordered chronologically, so as to convey the motivation and architectural reasoning underpinning each extension at the time of ratification. For extensions ratified prior to the conception of this appendix (ca. 2025), the rationale will be added over time. In cases where the rationale was not recorded, the authors and editors will synthesize it from the historical record.
"Zihintpause" Extension for Pause Hint
The PAUSE instruction hints to a hart that it should temporarily reduce its rate of execution. It is normally used to save energy and execution resources while polling, e.g. while waiting for a spinlock to become free.
Much of the debate surrounding this extension centered on whether a facility
similar to x86’s MONITOR/MWAIT should instead be provided.
We concluded that, even if such a facility were to be defined for RISC-V,
it would not supplant PAUSE.
PAUSE is more appropriate when polling for non-memory events, when polling for
multiple events, or when software does not know precisely what events it is
polling for.
(Perhaps surprisingly, the latter case is ubiquitous, in part because it is
the mechanism expected by the Linux kernel’s cpu_relax API.)
"Zicond" Extension for Integer Conditional Operations
Replacing unpredictable branches with conditional-select or conditional-move instructions can mitigate a class of costly branch mispredictions. Unfortunately, conditional-select instructions require three source operands. These instructions are a logical addition to ISAs that include three-source integer instructions for other reasons, but are too costly otherwise.
Some ISAs have instead furnished conditional-move instructions, which consume less encoding space and avoid the extra register read in simple microarchitectures. Unfortunately, in register-renamed microarchitectures, these instructions incur costs simlar to conditional select, or require additional microarchitectural structures and micro-op-issue constraints.
The Zicond extension was defined to solve the same problem as conditional select and conditional move, but with very little incremental cost for complex microarchitectures. It provides conditional-zero instructions, which read two source operands and, based upon the zeroness of the second operand, produce either the first operand or zero. These instructions can be used as part of a three-instruction sequence to synthesize conditional select. Several common conditional-execution idioms require only two instructions, as would be the case with conditional select or move, including conditional addition, subtraction, and bitwise AND, OR, and XOR.
Two conditional-zero instructions are included: one that writes zero if the comparand is zero, and one that does so if the comparand is nonzero. Variants that perform magnitude comparisons with zero were considered but ultimately excluded for insufficient quantitative justification.
"Zacas" Extension for Atomic Compare-and-Swap (CAS) Instructions
While compare-and-swap for XLEN wide data may be accomplished using LR/SC, the CAS atomic instructions scale better to highly parallel systems than LR/SC. Many lock-free algorithms, such as a lock-free queue, require manipulation of pointer variables. A simple CAS operation may not be sufficient to guard against what is commonly referred to as the ABA problem in such algorithms that manipulate pointer variables. To avoid the ABA problem, the algorithms associate a reference counter with the pointer variable and perform updates using a quadword compare and swap (of both the pointer and the counter). The double and quadword CAS instructions support implementation of algorithms for ABA problem avoidance.
The CAS instruction supports the C++11 atomic compare and exchange operation.
"Zabha" Extension for Byte and Halfword Atomic Memory Operations, Version 1.0
The A-extension offers atomic memory operation (AMO) instructions for words,
doublewords, and quadwords (only for AMOCAS). The absence of atomic
operations for subword data types necessitates emulation strategies. For bitwise
operations, this emulation can be performed via word-sized bitwise AMO*
instructions. For non-bitwise operations, emulation is achievable using
word-sized LR/SC instructions.
Several limitations arise from this emulation approach:
-
In systems with large-scale or Non-Uniform Memory Access (NUMA) configurations, emulation based on
LR/SCintroduces issues related to scalability and fairness, particularly under conditions of high contention. -
Emulation of narrower AMOs through wider AMO* instructions on non-idempotent IO memory regions may result in unintended side effects.
-
Utilizing wider AMO* instructions for emulating narrower AMOs risks activating extraneous breakpoints or watchpoints.
-
In the absence of native support for subword atomics, compilers often resort to inlining code sequences to provide the required emulation. This practice contributes to an increase in code size, with consequent impacts on system performance and memory utilization.
The Zabha extension addresses these limitations by adding support for byte and halfword atomic memory operations to the RISC-V Unprivileged ISA.