IOMMU Extensions

This chapter specifies the following standard extensions to the IOMMU Base Architecture:

Specification Version Status

Quality-of-Service (QoS) Identifiers Extension

1.0

Ratified

Non-leaf PTE Invalidation Extension

1.0

Ratified

Address Range Invalidation Extension

1.0

Ratified

PTE Reserved-for-Software Bits 60-59

1.0

Ratified

Quality-of-Service (QoS) Identifiers Extension, Version 1.0

Quality of Service (QoS) is defined as the minimal end-to-end performance guaranteed in advance by a service level agreement (SLA) to a workload. Performance metrics might include measures such as instructions per cycle (IPC), latency of service, etc.

When multiple workloads execute concurrently on modern processors — equipped with large core counts, multiple cache hierarchies, and multiple memory controllers — the performance of any given workload becomes less deterministic, or even non-deterministic, due to shared resource contention [10].

To manage performance variability, system software needs resource allocation and monitoring capabilities. These capabilities allow for the reservation of resources like cache and bandwidth, thus meeting individual performance targets while minimizing interference [11]. For resource management, hardware should provide monitoring features that allow system software to profile workload resource consumption and allocate resources accordingly.

To facilitate this, the QoS Identifiers ISA extension (Ssqosid) [12] introduces the srmcfg register, which configures a hart with two identifiers: a Resource Control ID (RCID) and a Monitoring Counter ID (MCID). These identifiers accompany each request issued by the hart to shared resource controllers.

These identifiers are crucial for the RISC-V Capacity and Bandwidth Controller QoS Register Interface [13], which provides methods for setting resource usage limits and monitoring resource consumption. The RCID controls resource allocations, while the MCID is used for tracking resource usage.

The IOMMU QoS ID extension provides a method to associate QoS IDs with requests to access resources by the IOMMU, as well as with devices governed by it. This complements the Ssqosid extension that provides a method to associate QoS IDs with requests originated by the RISC-V harts. Assocating QoS IDs with device and IOMMU originated requests is required for effective monitoring and allocation of shared resources.

The IOMMU capabilities register (iommu_registers.adoc#CAP) is extended with a QOSID field which enumerates support for associating QoS IDs with requests made through the IOMMU. When capabilities.QOSID is 1, the memory-mapped register layout is extended to add a register named iommu_qosid (iommu_registers.adoc#IOQOSID). This register is used to configure the Quality of Service (QoS) IDs associated with IOMMU-originated requests. The ta field of the device context (iommu_data_structures.adoc#DC_TA) is extended with two fields, RCID and MCID, to configure the QoS IDs to associate with requests originated by the devices.

Reset Behavior

If the reset value for ddtp.iommu_mode field is Bare, then the iommu_qosid.RCID field must have a reset value of 0.

At reset, it is required that the RCID field of iommu_qosid is set to 0 if the IOMMU is in Bare mode, as typically the resource controllers in the SoC default to a reset behavior of associating all capacity or bandwidth to the RCID value of 0. When the reset value of the ddtp.iommu_mode is not Bare, the iommu_qosid register should be initialized by software before changing the mode to allow DMA.

Sizing QoS Identifiers

The size (or width) of RCID and MCID, as fields in registers or in data structures, supported by the IOMMU must be at least as large as that supported by any RISC-V application processor hart in the system.

IOMMU ATC Capacity Allocation and Monitoring

Some IOMMUs might support capacity allocation and usage monitoring in the IOMMU address translation cache (IOATC) by implementing the capacity controller register interface.

Additionally, some IOMMUs might support multiple IOATCs, each potentially having different capacities. In scenarios where multiple IOATCs are implemented, such as an IOATC for each supported page size, the IOMMU can implement a capacity controller register interface for each IOATC to facilitate individual capacity allocation.

Non-leaf PTE Invalidation Extension, Version 1.0

The RISC-V IOMMU Version 1.0 specification provides commands to invalidate leaf page table entries from address translation caches when performing an address-specific invalidation operation. The non-leaf PTE invalidation extension provides commands to optionally also invalidate non-leaf PTE entries from the address translation caches when performing an address-specific invalidation operation.

The non-leaf PTE invalidation extension is implemented if the capabilities.NL (bit 42) is 1. When the capabilities.NL bit is 1, a non-leaf (NL) field is defined at bit 34 in the IOTINVAL.VMA and IOTINVAL.GVMA commands by this extension. When the capabilities.NL bit is 0, bit 34 remains reserved.

The non-leaf PTE invalidation extension enables optimizations in shared virtual addressing use cases by providing the ability to invalidate non-leaf PTEs corresponding to the IOVA being invalidated from the IOMMU address translation caches.

If the address range invalidation extension is also implemented, the NL operand applies to the address range determined by the ADDR and S operands.

Non-leaf PTE Invalidation by IOTINVAL.VMA

  • When the AV operand is 0, the NL operand is ignored and the IOTINVAL.VMA command operations are as specified in RISC-V IOMMU Version 1.0 specification.

  • When the AV operand is 1 and the NL operand is 0, the IOTINVAL.VMA command operations are as specified in RISC-V IOMMU Version 1.0 specification.

  • When both the AV and NL operands are 1, the IOTINVAL.VMA command performs the following operations:

    • When GV=0 and PSCV=0: Invalidates information cached from all levels of first-stage page table entries corresponding to the IOVA in the ADDR operand for all host address spaces, including entries containing global mappings.

    • When GV=0 and PSCV=1: Invalidates information cached from all levels of first-stage page table entries corresponding to the IOVA in the ADDR operand and the host address space identified by the PSCID operand, except for entries containing global mappings.

    • When GV=1 and PSCV=0: Invalidates information cached from all levels of first-stage page table entries corresponding to the IOVA in the ADDR operand for all VM address spaces associated with the GSCID operand, including entries that contain global mappings.

    • When GV=1 and PSCV=1: Invalidates information cached from all levels of first-stage page table entries corresponding to the IOVA in the ADDR operand and the VM address space identified by the PSCID and GSCID operands, except for entries containing global mappings.

Non-leaf PTE Invalidation by IOTINVAL.GVMA

  • When the GV operand is 0, both the AV and NL operands are ignored and the IOTINVAL.GVMA command operations are as specified in RISC-V IOMMU Version 1.0 specification.

  • When the GV operand is 1 and the AV operand is 0, the NL operand is ignored and the IOTINVAL.GVMA command operations are as specified in RISC-V IOMMU Version 1.0 specification.

  • When the GV and AV operands are 1 and the NL operand is 0, the IOTINVAL.GVMA command operations are as specified in RISC-V IOMMU Version 1.0 specification.

  • When GV, AV, and NL are all 1, the IOTINVAL.GVMA command performs the following operations:

    • Invalidates information cached from all levels of second-stage page table entries corresponding to the guest-physical address in the ADDR operand and the VM address spaces identified by the GSCID operand.

Address Range Invalidation Extension, Version 1.0

The address range invalidation extension enables specifying a range of addresses in an IOMMU ATC invalidation command, reducing the number of commands queued to the IOMMU. This facility is especially useful when superpages are employed in page tables.

The address range invalidation extension is implemented if capabilities.S (bit 43) is 1. When capabilities.S is 1, a range-size (S) operand is defined at bit 73 in the IOTINVAL.VMA and IOTINVAL.GVMA commands by this extension. When the capabilities.S bit is 0, bit 73 remains reserved.

When the GV operand is 0, both the AV and S operands are ignored by the IOTINVAL.GVMA command. When the AV operand is 0, the S operand is ignored in both the IOTINVAL.VMA and IOTINVAL.GVMA commands. When the S operand is ignored or set to 0, the operations of the IOTINVAL.VMA and IOTINVAL.GVMA commands are as specified in the RISC-V IOMMU Version 1.0 specification.

When the S operand is not ignored and is 1, the ADDR operand represents a NAPOT range encoded in the operand itself. Starting from bit position 0 of the ADDR operand, if the first 0 bit is at position X, the range size is 2(X+1) * 4 KiB. When X is 0, the size of the range is 8 KiB.

If the S operand is not ignored and is 1 and all bits of the ADDR operand are 1, the behavior is UNSPECIFIED.

If the S operand is not ignored and is 1 and the most significant bit of the ADDR operand is 0 while all other bits are 1, the specified address range covers the entire address space.

The NAPOT range encoding used by this extension follows the convention used by PCIe ATS Invalidation Requests to denote address ranges. This convention is also used to encode the translation range size in tr_response (iommu_registers.adoc#TRR_RSP) register.

Simpler implementations may invalidate all address-translation cache entries when the S bit is set to 1.

PTE Reserved-for-Software Bits 60-59, Version 1.0

The Svrsw60t59b extension is implemented if capabilities.Svrsw60t59b (bit 14) is set to 1.