IOMMU Extensions
This chapter specifies the following standard extensions to the IOMMU Base Architecture:
| Specification | Version | Status |
|---|---|---|
1.0 |
Ratified |
|
1.0 |
Ratified |
|
1.0 |
Ratified |
|
1.0 |
Ratified |
Quality-of-Service (QoS) Identifiers Extension, Version 1.0
Quality of Service (QoS) is defined as the minimal end-to-end performance guaranteed in advance by a service level agreement (SLA) to a workload. Performance metrics might include measures such as instructions per cycle (IPC), latency of service, etc.
When multiple workloads execute concurrently on modern processors — equipped with large core counts, multiple cache hierarchies, and multiple memory controllers — the performance of any given workload becomes less deterministic, or even non-deterministic, due to shared resource contention [10].
To manage performance variability, system software needs resource allocation and monitoring capabilities. These capabilities allow for the reservation of resources like cache and bandwidth, thus meeting individual performance targets while minimizing interference [11]. For resource management, hardware should provide monitoring features that allow system software to profile workload resource consumption and allocate resources accordingly.
To facilitate this, the QoS Identifiers ISA extension (Ssqosid) [12]
introduces the srmcfg register, which configures a hart with two identifiers:
a Resource Control ID (RCID) and a Monitoring Counter ID (MCID). These
identifiers accompany each request issued by the hart to shared resource
controllers.
These identifiers are crucial for the RISC-V Capacity and Bandwidth Controller
QoS Register Interface [13], which provides methods for setting resource
usage limits and monitoring resource consumption. The RCID controls resource
allocations, while the MCID is used for tracking resource usage.
The IOMMU QoS ID extension provides a method to associate QoS IDs with requests to access resources by the IOMMU, as well as with devices governed by it. This complements the Ssqosid extension that provides a method to associate QoS IDs with requests originated by the RISC-V harts. Assocating QoS IDs with device and IOMMU originated requests is required for effective monitoring and allocation of shared resources.
The IOMMU capabilities register (iommu_registers.adoc#CAP) is extended with a QOSID field
which enumerates support for associating QoS IDs with requests made through the
IOMMU. When capabilities.QOSID is 1, the memory-mapped register layout is
extended to add a register named iommu_qosid (iommu_registers.adoc#IOQOSID). This register is
used to configure the Quality of Service (QoS) IDs associated with
IOMMU-originated requests. The ta field of the device context (iommu_data_structures.adoc#DC_TA) is
extended with two fields, RCID and MCID, to configure the QoS IDs to
associate with requests originated by the devices.
Reset Behavior
If the reset value for ddtp.iommu_mode field is Bare, then the
iommu_qosid.RCID field must have a reset value of 0.
|
At reset, it is required that the |
Sizing QoS Identifiers
The size (or width) of RCID and MCID, as fields in registers or in data
structures, supported by the IOMMU must be at least as large as that supported
by any RISC-V application processor hart in the system.
IOMMU ATC Capacity Allocation and Monitoring
Some IOMMUs might support capacity allocation and usage monitoring in the IOMMU address translation cache (IOATC) by implementing the capacity controller register interface.
Additionally, some IOMMUs might support multiple IOATCs, each potentially having different capacities. In scenarios where multiple IOATCs are implemented, such as an IOATC for each supported page size, the IOMMU can implement a capacity controller register interface for each IOATC to facilitate individual capacity allocation.
Non-leaf PTE Invalidation Extension, Version 1.0
The RISC-V IOMMU Version 1.0 specification provides commands to invalidate leaf page table entries from address translation caches when performing an address-specific invalidation operation. The non-leaf PTE invalidation extension provides commands to optionally also invalidate non-leaf PTE entries from the address translation caches when performing an address-specific invalidation operation.
The non-leaf PTE invalidation extension is implemented if the capabilities.NL
(bit 42) is 1. When the capabilities.NL bit is 1, a non-leaf (NL) field is
defined at bit 34 in the IOTINVAL.VMA and IOTINVAL.GVMA commands by this
extension. When the capabilities.NL bit is 0, bit 34 remains reserved.
|
The non-leaf PTE invalidation extension enables optimizations in shared virtual addressing use cases by providing the ability to invalidate non-leaf PTEs corresponding to the IOVA being invalidated from the IOMMU address translation caches. |
If the address range invalidation extension is also implemented, the NL
operand applies to the address range determined by the ADDR and S operands.
Non-leaf PTE Invalidation by IOTINVAL.VMA
-
When the
AVoperand is 0, theNLoperand is ignored and theIOTINVAL.VMAcommand operations are as specified in RISC-V IOMMU Version 1.0 specification. -
When the
AVoperand is 1 and theNLoperand is 0, theIOTINVAL.VMAcommand operations are as specified in RISC-V IOMMU Version 1.0 specification. -
When both the
AVandNLoperands are 1, theIOTINVAL.VMAcommand performs the following operations:-
When
GV=0andPSCV=0: Invalidates information cached from all levels of first-stage page table entries corresponding to the IOVA in theADDRoperand for all host address spaces, including entries containing global mappings. -
When
GV=0andPSCV=1: Invalidates information cached from all levels of first-stage page table entries corresponding to the IOVA in theADDRoperand and the host address space identified by thePSCIDoperand, except for entries containing global mappings. -
When
GV=1andPSCV=0: Invalidates information cached from all levels of first-stage page table entries corresponding to the IOVA in theADDRoperand for all VM address spaces associated with theGSCIDoperand, including entries that contain global mappings. -
When
GV=1andPSCV=1: Invalidates information cached from all levels of first-stage page table entries corresponding to the IOVA in theADDRoperand and the VM address space identified by thePSCIDandGSCIDoperands, except for entries containing global mappings.
-
Non-leaf PTE Invalidation by IOTINVAL.GVMA
-
When the
GVoperand is 0, both theAVandNLoperands are ignored and theIOTINVAL.GVMAcommand operations are as specified in RISC-V IOMMU Version 1.0 specification. -
When the
GVoperand is 1 and theAVoperand is 0, theNLoperand is ignored and theIOTINVAL.GVMAcommand operations are as specified in RISC-V IOMMU Version 1.0 specification. -
When the
GVandAVoperands are 1 and theNLoperand is 0, theIOTINVAL.GVMAcommand operations are as specified in RISC-V IOMMU Version 1.0 specification. -
When
GV,AV, andNLare all 1, theIOTINVAL.GVMAcommand performs the following operations:-
Invalidates information cached from all levels of second-stage page table entries corresponding to the guest-physical address in the
ADDRoperand and the VM address spaces identified by theGSCIDoperand.
-
Address Range Invalidation Extension, Version 1.0
The address range invalidation extension enables specifying a range of addresses in an IOMMU ATC invalidation command, reducing the number of commands queued to the IOMMU. This facility is especially useful when superpages are employed in page tables.
The address range invalidation extension is implemented if capabilities.S (bit
43) is 1. When capabilities.S is 1, a range-size (S) operand is defined at
bit 73 in the IOTINVAL.VMA and IOTINVAL.GVMA commands by this extension.
When the capabilities.S bit is 0, bit 73 remains reserved.
When the GV operand is 0, both the AV and S operands are ignored by the
IOTINVAL.GVMA command. When the AV operand is 0, the S operand is ignored
in both the IOTINVAL.VMA and IOTINVAL.GVMA commands. When the S operand is
ignored or set to 0, the operations of the IOTINVAL.VMA and IOTINVAL.GVMA
commands are as specified in the RISC-V IOMMU Version 1.0 specification.
When the S operand is not ignored and is 1, the ADDR operand represents a
NAPOT range encoded in the operand itself. Starting from bit position 0
of the ADDR operand, if the first 0 bit is at position X, the range size is
2(X+1) * 4 KiB. When X is 0, the size of the range is 8 KiB.
If the S operand is not ignored and is 1 and all bits of the ADDR operand
are 1, the behavior is UNSPECIFIED.
If the S operand is not ignored and is 1 and the most significant bit of the
ADDR operand is 0 while all other bits are 1, the specified address range
covers the entire address space.
|
The NAPOT range encoding used by this extension follows the convention used by
PCIe ATS Invalidation Requests to denote address ranges. This convention is also
used to encode the translation range size in Simpler implementations may invalidate all address-translation cache entries
when the |