2.1. Server SoC Requirements
2.1.1. Clocks and Timers
| ID# | Rule |
|---|---|
CTI_010 |
The |
The Zicntr extension [7] requires the real-time clocks of all harts to be synchronized to within one tick of the real-time clock. |
|
CTI_020 |
The |
This rule does not apply to system power states such as G3 (power
off), S3 (Suspend to RAM), or S4 (Hibernate). |
|
2.1.2. Interrupt Controllers
This section specifies the requirements on the interrupt controllers used to deliver external interrupts to the RISC-V application processor harts.
| ID# | Rule |
|---|---|
IIC_010 |
The RISC-V Advanced Interrupt Architecture [8] MUST be supported. |
IIC_020 |
External interrupts MUST be signaled to a hart as message-signaled interrupts (MSI). |
Since Message Signaled Interrupts (MSI) are implemented as memory writes,
they facilitate a simplified enforcement of producer-consumer ordering
rules. Specifically, interrupts issued by a device following a write
operation must be processed only after the previous write operations have
been completed and observed. Similarly, interrupts issued by a device must
be observed before any subsequent read completions generated by the
device. |
|
IIC_030 |
The Incoming Message-signaled Interrupt Controller (IMSIC) MUST implement an interrupt file for S-mode. |
IIC_040 |
The IMSIC MUST support at least 5 VS-mode interrupt files. |
Supporting 5 VS-mode interrupt files for a hart allows context switching between up to 5 virtual CPUs (vCPU) on a hart without needing to swap the contents of the interrupt file out to memory. This is particularly beneficial when devices are directly assigned to virtual machines (VMs), as swapping out the context of an IMSIC interrupt file may result in longer latencies due to the need to redirect device interrupts to a memory-resident interrupt file. |
|
IIC_050 |
The S-mode interrupt file MUST support at least 255 interrupt identities. |
IIC_060 |
The VS-mode interrupt files MUST support at least 63 interrupt identities. |
IIC_070 |
The memory regions designated for IMSIC interrupt files MUST have the following PMAs:
|
IIC_080 |
If the SoC implements devices that use wire-signaled interrupts
then the SoC MUST implement an APLIC as specified by the RISC-V
AIA specification and MUST use the APLIC to convert the
wire-signaled interrupts into MSIs.
|
SoC devices using wire-signaled interrupts must implement the rules related to ordering of interrupts vs. older read/writes from devices as specified by the device and/or bus interface specifications that such devices conform to. See also SID_010. |
|
2.1.3. Input-Output Memory Management Unit (IOMMU)
| ID# | Rule |
|---|---|
IOM_010 |
All IOMMUs in the SoC MUST support the RISC-V IOMMU specification [9]. |
IOM_020 |
All DMA capable peripherals (RCiEP and non-PCIe devices) and all
PCIe root ports accessible by software on the RISC-V application
processor harts MUST be governed by an IOMMU.
|
DMA capable peripherals being governed by an IOMMU allows OS/hypervisors to
restrict DMA originating from such devices to a subset of memory to enhance
security and software fault tolerance. The address translation capability
provided by the IOMMU enables usages such as passthrough of such devices to
virtual machines, shared virtual addressing, etc. |
|
IOM_030 |
The IOMMU governing a PCIe root port MUST support at least 16-bit wide device IDs. |
IOM_040 |
An IOMMU that does not govern a PCIe root port MUST support a device ID width required to support all requester IDs originated by the devices governed by that IOMMU. |
IOM_050 |
The IOMMU MUST implement all the page based virtual memory system modes and extensions that are implemented by the RISC-V application processor harts in the SoC. |
The page based virtual memory system modes supported by the IOMMU are enumerated in the IOMMU capabilities register. |
|
IOM_070 |
The IOMMU SHOULD support pass-through mode and MRIF mode MSI address translation. |
IOM_080 |
When MRIF mode MSI address translation is supported, the IOMMU MUST
support atomic updates to the MRIF (enumerated by 1 setting of
|
IOM_090 |
IOMMU governing PCIe root ports SHOULD support PCIe address translation services (ATS). |
High performance devices such as DPU/SmartNICs, GPUs, and FPGAs, utilized in server platforms rely on ATS and Page Request services to achieve high throughput and low-latency I/O. Supporting ATS is also required for efficiently accommodating usage models such as Shared Virtual Addressing and direct work submission from user mode. |
|
IOM_100 |
IOMMU governing PCIe root ports SHOULD support the T2GPA mode of operation with ATS if ATS is supported. |
The T2GPA control enables a hypervisor to prevent DMA from a device, even if the device misuses the ATS capability and attempts to access memory that is not explicitly authorized by the page tables governing that device’s memory accesses. The threat model could also include a man-in-the-middle on the PCIe link inserting ATS-translated requests to access memory that was not previously authorized. As an alternative to setting T2GPA to 1, the hypervisor might establish a trust relationship with the device if authentication protocols such as SPDM are supported by the device. For PCIe, for example, the PCIe Component Measurement and Authentication (CMA) capability provides a mechanism to verify the device’s configuration and firmware/executable (Measurement) and hardware identities (Authentication). This mechanism establishes such a trust relationship, and the PCIe link may be integrity-protected using PCIe integrity and data encryption (IDE) to defend against a man-in-the-middle adversary. |
|
IOM_110 |
IOMMU governing RCiEP MUST support PCIe address translation services (ATS) if any of the RCiEPs governed by the IOMMU support the ATS capability. |
IOM_120 |
IOMMU governing RCiEP MAY support the T2GPA mode of operation with ATS if ATS is supported. |
The threats associated with misuse of ATS or malicious insertion of ATS translated requests by a man-in-the-middle may not be present with RCiEP being integrated in the SoC. |
|
IOM_130 |
IOMMU MUST support MSI and MAY support wire-signaled interrupts for external interrupts originated by the IOMMU itself. |
IOM_140 |
IOMMU MUST support little-endian memory access to its in-memory data structures. |
IOM_150 |
IOMMU MAY support big-endian mode memory access to its in-memory data structures. |
The IOMMU memory-mapped registers always have a little-endian byte order. |
|
IOM_160 |
IOMMU MAY support the PCIe PASID capability. |
IOM_170 |
IOMMU that supports PASID capability MUST support 20-bit PASID width and MAY support 8-bit and 17-bit PASID widths. |
PCIe specification strongly recommends that hardware implement the maximum width of 20 bits to ensure interoperability with system software. See also the implementation note on PASID width homogeneity in the PCIe specification 6.0 section 6.20.2.2. |
|
IOM_180 |
IOMMU SHOULD support a hardware performance monitor (HPM). |
The HPM is a valuable tool for system integrators for performance monitoring and optimizations. An IOMMU is highly recommended to provide an HPM. |
|
IOM_190 |
An IOMMU that supports an HPM MUST support the cycles counter. |
IOM_200 |
An IOMMU that supports an HPM MUST incorporate at least 4 event counters. |
A typical performance analysis operation may involve simultaneously counting the number of translation requests, IOATC misses, and page table walks. An HPM with sufficient number of event counters ensures accurate and comprehensive data collection, enabling detailed performance analysis and optimization. |
|
IOM_210 |
The cycles counter and the event counters MUST be at least 40 bits wide. |
IOM_220 |
The IOMMU SHOULD support the software debug capabilities enumerated
by |
IOM_230 |
The physical address width supported by the IOMMU MUST be greater than or equal to the physical address width supported by the RISC-V application processor harts in the SoC. |
Having the physical address width greater than or equal to the width supported the harts in the SoC enables use of all addressable memory for I/O and facilitates the sharing of page tables between the hart MMU and the IOMMU. |
|
IOM_240 |
The reset default of the |
The IOMMU disallowing DMA unconditionally following reset due to the mode being Off allows the SoC firmware and software to enable DMA when suitable security protections as required have been established. The IOMMU mode being Off at reset does not pose a significant issue to SoC firmware that needs to employ DMA (e.g., for firmware loading) as that firmware may program the mode in the appropriate IOMMU prior to programming the peripheral governed by that IOMMU to perform a DMA. |
|
IOM_250 |
An IOMMU that is implemented as an RCiEP MUST use base class 08H and subclass 06H [10]. |
The base class 08H and sub-class 06H are designated by PCIe for use by an IOMMU. Implementing the IOMMU as a PCIe device allows an operating system to determine a driver for the IOMMU and to assign resources such as interrupt vectors to the IOMMU in a PCIe compatible manner. |
|
IOM_260 |
The host bridge MUST enforce the physical memory attribute checks and physical memory protection checks on memory accesses originated by the IOMMU and signal detected access violations to the IOMMU. |
These checks are analogous to the PMA and PMP checks performed by the
RISC-V hart. The host bridge (also known as IO bridge) invokes the IOMMU
for address translations. To perform the operations requested by the host
bridge the IOMMU may need to access in-memory data structures such as the
device directory table and page tables. |
|
IOM_270 |
An IOMMU MUST support 24-bit device IDs if the IOMMU governs multiple PCIe root ports that may be part of different PCIe hierarchies. |
An IOMMU governing PCIe root ports uses requester ID (RID) - the tuple of bus/device/function numbers (or just bus/function numbers, if the PCIe ARI option is used) - to locate a device context to use for address translation and protection. The 16-bit RID uniquely identifies a requester within a hierarchy. This RID needs to be augmented with the Hierarchy ID (also known as segment ID) - an 8-bit number - to uniquely identify a requester across PCIe hierarchies. |
|
IOM_280 |
The host bridge MUST provide the PCIe RID as the bits 15:0 of the
|
IOM_290 |
When the IOMMU supports 24-bit device IDs, the host bridge MUST
specify the segment number associated with the PCIe hierarchy from
which requests were received as the bits 23:16 of the |
IOM_300 |
The determination of |
IOM_310 |
The host bridge MUST provide the 20-bit PASID from the PCIe PASID
TLP Prefix as the |
The host bridge providing the full 20-bit value without truncation from the PASID TLP prefix to the IOMMU enables the IOMMU to determine if the PASID value is wider than supported by the current configuration of the process directory table for that device and generate a fault notification if so. |
|
IOM_320 |
The determination of |
2.1.4. PCIe Subsystem
A PCIe subsystem consists of a root complex with a collection of root ports, root complex event collectors (RCECs), root complex register blocks (RCRBs), and root complex integrated end points (RCiEPs). The root complex implements a host bridge to connect the PCIe root ports, RCECs, RCRBs, and RCiEP, to the CPU and system memory in the SoC through an interconnect.
One or more root ports in a root complex may be part of a hierarchy where a hierarchy is a PCI Express I/O interconnect topology, wherein the Configuration Space addresses, referred to as the tuple of Bus/Device/Function Numbers (or just Bus/Function Numbers, for PCIe ARI cases), are unique. These addresses are used for Configuration Request routing, Completion routing, some Message routing, and for other purposes. In some contexts a Hierarchy is also called a Segment, and in Flit Mode, the Segment number is sometimes also included in the ID of a Function. Each root port in a hierarchy originates a hierarchy domain i.e. a part of a Hierarchy originating from a single Root Port. The root ports are PCI-PCI bridges that bridge a primary PCIe bus to a range of secondary and subordinate buses.
In some SoCs, PCIe devices may be integrated in the same package/die as the root complex. Examples of such devices are network controllers, USB host controllers, NVMe controllers, AHCI controllers, etc. Such SoC integrated devices may be presented to software using one of the following options:
-
Presented to software as a PCIe endpoint (EP; See section 1.3.2.2 of the PCIe 6.0 specification) connected to a PCIe root port (See example of such an endpoint connected to root port 3 in Figure 2). Such PCIe endpoints must comply with the PCIe specified rules for endpoints.
-
Presented to software as a root complex integrated endpoint (RCiEP; See section 1.3.2.3 of the PCIe 6.0 specification). Such PCIe endpoints must comply with the PCIe specified rules for RCiEP.
Implementing integrated devices that perform as RCiEP or EP allows the use of standardized PCIe frameworks for memory and interrupt resource allocation, virtualization (SR-IOV), ATS/PRI for shared virtual addressing, trusted IO using SPDM/TDISP, RAS frameworks like data poisoning and AER, power management, etc.
The host bridge is placed between the device(s) and the system interconnect to process DMA transactions. Devices perform DMA transactions using IO Virtual Addresses (VA, GVA or GPA). The host bridge invokes the associated IOMMU to translate the IOVA to Supervisor Physical Addresses (SPA).
| RCI_010 | The PCIe root ports, host bridges, RCRBs, and RCECs in the root compplex MUST implement all software visible rules defined by the PCIe specification 6.0 for the root complex as applicable. |
|---|
2.1.4.1. Enhanced Configuration Access Method (ECAM)
Each PCIe endpoint and the PCIe root port itself implement a set of memory mapped configuration registers that are accessed using the PCIe enhanced configuration access method (ECAM). The memory mapped ECAM address range for a hierarchy is up to 256 MiB in size and the base address of the range is naturally aligned to the size. Each PCIe function is associated with a 4 KiB page in this range such that the address bits (20+b):20 where b=0 to 7 identify the bus number of that function (see also recommendations in the PCIe specification 6.0 section 7.2.2), the address bits 19:15 identify the device number, and the address bits 14:12 identify the function number. The host bridge in conjunction with the SoC boot firmware maps the ECAM address range to the hierarchy domain originating at each PCIe root port.
| ID# | Rule |
|---|---|
ECM_010 |
The ECAM address ranges MUST have the following physical memory attributes (PMAs):
|
See also the implementation note on root complex requirements for
generating configuration requests in section 7.2.2 of
PCIe specification 6.0. |
|
ECM_020 |
Writes to the ECAM address range from a RISC-V hart MUST be non-posted and the write MUST complete at the hart only after a completion is received from the function hosting the accessed configuration register. |
Besides performing a write, software executing on a hart must not
require any additional actions to achieve this property. |
|
ECM_030 |
The ECAM address range for a hierarchy MUST be contiguous and the base address of the range MUST be naturally aligned to the size of the ECAM address range associated with the hierarchy. |
ECM_040 |
A SoC MAY support multiple hierarchies. When multiple hierarchies are supported, the ECAM address range of the hierarchies MUST NOT overlap, but they are not required to be contiguous. |
ECM_050 |
The configuration space of the PCIe root ports MUST be associated with the primary bus number of the hierarchy associated with the root port. |
PCIe root ports are PCI-PCI bridges that bridge the primary bus to the secondary/subordinate buses. The root port itself enumerates as a PCI-PCI bridge device on the primary bus. The collection of primary, secondary, and subordinate buses are part of a single hierarchy domain that originates at that PCIe root port. |
|
ECM_060 |
The configuration space of functions on the primary bus MUST be accessible irrespective of the state of the corresponding PCIe link. |
Discovery and activation of the PCIe link requires accessing the configuration space registers of the PCIe root port itself and the PCIe root port is a PCI-PCI bridge device on the primary bus. |
|
ECM_070 |
The PCIe root port MUST support the PCIe Configuration RRS software (CRS) visibility enable control. |
The number of times a configuration request is retried on an RRS response
is |
|
ECM_080 |
Read and/or write to the ECAM range of the hierarchy domain originating at a root port MUST generate PCIe configuration transactions as type 0 or type 1 configuration transactions following the rules specified for ECAM in PCIe specification 6.0. |
Determination of the type of configuration transaction based on whether the access is to the primary, secondary or subordinate buses may involve logic in the host bridge to work in conjunction with the root port PCIe controller. See also Alternative Routing-ID Interpretation in PCIe specification 6.0 section 6.13 for rules related to converting type 1 configuration requests into type 0 configuration request based on the traditional Device Number field being 0. Specifically, when ARI forwarding is disabled, write accesses to configuration space of Device Number greater than 0 must be silently dropped, and read accesses must be responded to with all 1s data. |
|
ECM_090 |
Read access to ECAM address range from a RISC-V hart MUST be responded with all 1s data if any of the following conditions are TRUE:
|
The data response to the Vendor ID register on receipt of an RRS response MUST follow the PCIe defined rules. See also the recommendations in PCIe specification 6.0 section 2.3.2. |
|
ECM_100 |
Write access from a RISC-V hart to configuration registers of a non-existent function on the primary bus MUST be dropped (silently ignored or discarded) and the write completed. Such accesses MUST NOT lead to any other behavior (e.g., hangs, deadlocks, etc.). |
ECM_110 |
Poisoned data received from completers (EP=1) MUST be forwarded to the requesting RISC-V hart as poisoned data unless such forwarding is disallowed (e.g., SoC does not support data poisoning or forwarding of poisoned data is disabled though implementation defined means). If forwarding of poisoned data is disallowed then the poisoned data MUST be replaced with all 1s data. |
2.1.4.2. PCIe Memory Space
| ID# | Rule |
|---|---|
MMS_010 |
The SoC MUST support designating, for each hierarchy domain, one or more ranges of system physical addresses that may be used for mapping memory space of endpoints in that hierarchy domain using the 64-bit wide base address registers (BARs) of the endpoints. |
MMS_020 |
SoC MUST support designating, for each hierarchy domain, at least one system physical address range for mapping memory space of endpoints in that hierarchy domain using 32-bit wide BARs of the endpoint. |
The ranges suitable for mapping using 32-bit BARs are also sometimes termed
as the low MMIO ranges and those suitable for use with 64-bit BARs termed
as high MMIO ranges. |
|
MMS_030 |
The system physical address ranges designated for mapping endpoint memory spaces have the following physical memory attribute (PMAs):
|
Software may use the Svpbmt extension to override the PMA to NC if such an
override is compatible with the restricted programming model of the
device. |
|
MMS_040 |
A load from a RISC-V application processor hart to memory ranges designated for mapping memory spaces of endpoints or RCiEP MUST complete with an all 1s response and MUST NOT lead to any abnormal behavior (e.g., hangs, deadlocks, etc.) if any of the following are TRUE:
|
The 64-bit memory base/limit register was previously called Prefetchable
Memory Base/Limit. The concept of “Prefetchable” MMIO was originally needed
to control PCI-PCI Bridges, which were allowed/encouraged to prefetch
Memory Read data in prefetchable regions. The original intent of the
Prefetchable/Non-Prefetchable distinction was focused on PCI behaviors,
and was not intended for software use in determining memory attributes
and/or coding techniques. The "Removing Prefetchable Terminology" ECN
Removing Prefetchable Terminology ECN reworks the PCIe Base Specification to remove Prefetchable
terminology. |
|
MMS_050 |
A store from a RISC-V application processor hart to memory ranges designated for mapping memory space of endpoints or RCiEP MUST be dropped (silently ignored or discarded) and MUST NOT lead to any abnormal behavior (e.g., hangs, deadlocks, etc.) if any of the following are TRUE:
|
MMS_060 |
Poisoned data received from completers (EP=1) MUST be forwarded to the requester PCIe device (a RCiEP or an endpoint) as poisoned data unless such forwarding is disallowed (e.g., poisoned TLP egress blocking). |
MMS_070 |
Poisoned data received from completers (EP=1) MUST be forwarded to a requester RISC-V hart as poisoned data unless such forwarding is disallowed through implementation defined means. When such forwarding is disallowed, then the poisoned data MUST be replaced with all 1s data. |
MMS_080 |
SoC MUST NOT use EA capability to indicate memory resources for allocation to endpoints downstream of a PCIe root port. |
2.1.4.3. Access Control Services (ACS)
The PCIe ACS provides controls on routing of PCIe TLPs. ACS controls may be used to determine whether the TLP should be routed normally, blocked, or redirected. These controls may be applicable to the root complex, switches, multi-function devices, and SR-IOV capable devices.
| ID# | Rule |
|---|---|
ACS_010 |
PCIe root ports and SoC integrated downstream switch ports MUST support the following PCIe access control services (ACS) controls:
|
ACS_020 |
If a PCIe root port or a SoC-integrated downstream switch port implements a memory BAR, then it SHOULD support the PCIe ACS DSP memory target access control. |
The ACS DSP memory target access control can be used to prevent unauthorized accesses to protected memory spaces such as the PCIe root port’s BAR mapped registers. |
|
ACS_030 |
Root ports and SoC-integrated downstream switch ports that support direct routing between root ports or direct routing from ingress to egress port of a root port MUST support the following PCIe ACS controls:
|
ACS_040 |
Root ports and SoC-integrated downstream switch ports that support direct routing between root ports or direct routing from ingress to egress port of a root port SHOULD also support ACS P2P egress control. |
More commonly, P2P routing is accomplished by forwarding the TLP to the host bridge for routing. For further information, refer to the application note accompanying Fig 2-14 and Section 1.3.1 of the PCIe specification 6.0. |
|
2.1.4.4. Address Routed Transactions
The rules in this section apply to treatment in the root complex of TLPs that are routed by address. An address carried in such transactions may be the address of a host memory location or the address of a location in the memory space of an endpoint or RCiEP.
| ID# | Rule |
|---|---|
ADR_010 |
The host bridge MUST request IOMMU translations for addresses (Translated, Untranslated, or a PCIe ATS address translation request) used in the request by endpoints and RCiEPs. |
The IOMMU must be invoked even for Translated requests to allow
determination of whether the requester is configured by software to use
Translated requests. |
|
ADR_020 |
The host bridge MUST enforce physical memory attribute checks and physical memory protection checks on the translated address provided by the IOMMU and MUST treat violating requests as Unsupported Requests. |
These checks are analogous to the PMA and PMP checks performed by the RISC-V hart. |
|
ADR_030 |
For Translated and Untranslated requests, the host bridge MUST use the translated addresses provided by the IOMMU to determine whether the transaction is targeting host memory or peer device memory. |
ADR_040 |
The host bridge MAY support devices accessing peer devices' memory. If peer device memory access is not enabled (either by design or configuration), then such accesses MUST be responded to with a UR/CA response. The host bridge MUST NOT cause any other errors (e.g., hang, deadlock, etc.) when rejecting access by a device to a peer device’s memory. |
A virtual machine may violate the peer-to-peer access policies and/or configurations enforced by the hypervisor and/or SoC firmware, which prohibit peer device memory accesses. In instances where a VM configures devices passed through to it to perform peer memory accesses, such attempts must not result in system instabilities (e.g., hangs, deadlocks, etc.) or errors. Compliance with this directive ensures system resilience against unauthorized access attempts, maintaining operational integrity. |
|
ADR_050 |
When a posted or non-posted-with-data request from a device is allowed to access peer device memory, then any poisoned data (EP=1) MUST be forwarded as poisoned data, unless such forwarding is disallowed (e.g., due to poisoned TLP egress blocking or lack of support for data poisoning in the SoC). |
ADR_060 |
Host memory writes resulting from posted or non-posted-with-data requests with poisoned data (EP=1) MUST mark such data as poisoned in the host memory. |
ADR_070 |
Host memory reads that encounter uncorrectable data errors detected within the SoC MUST result in a response with poisoned data (EP=1) if transmission of poisoned TLPs is not blocked (see also section 2.7.2.1 of PCIe specification 6.0). |
2.1.4.5. ID Routed Transactions
The rules in this section apply to treatment in the root complex of TLPs that are routed by ID. Such requests may be Configuration requests, ID routed messages or completions.
| ID# | Rule |
|---|---|
IDR_010 |
Configuration requests from endpoints and RCiEP MUST be treated as Unsupported Requests. |
IDR_020 |
P2P routing of PCIe VDM between root ports within or across hierarchies SHOULD be supported. |
MCTP transport protocols using PCIe VDM are used by the BMC to manage PCIe/CXL devices. These messages are used to support manageability protocols such as PLDM, NVMe-MI, Redfish, etc. Supporting P2P routing of VDMs such as those carrying MCTP protocol messages enables greater system design flexibility in supporting these management protocols. |
|
IDR_030 |
P2P routing of PCIe VDM to/from RCIeP MAY be supported. |
2.1.4.6. Cacheability and Coherence
| ID# | Rule |
|---|---|
CCS_010 |
The host bridge MUST enforce PCIe memory ordering rules and SHOULD support the relaxed ordering (RO) and ID-based ordering (IDO). |
An implementation may occasionally or never permit the relaxations allowed by RO and/or IDO attributes. Such implementations will result in a more conservative interpretation of the ordering rules, but they will not result in a violation of the ordering rules. |
|
CCS_020 |
Writes to host or device memory using the RO attribute set to 0 MUST be observed by other harts and bus mastering devices in the order in which the write was received by the PCIe root port or the host bridge, ensuring that all previous writes are globally observed before the RO=0 write is globally observed. |
CCS_030 |
The host bridge MUST enforce the idempotency, coherence, cacheability, and access type physical memory attributes of the accessed memory and perform any reordering or combining of PCIe transactions only if the combination of physical memory attributes and TLP-specified memory ordering attributes allow it. |
CCS_040 |
The host bridge SHOULD implement hardware enforced cache coherency,
irrespective of the “No Snoop” attribute in the TLP, unless it has
been configured through |
A PCIe requester is permitted to set the “No Snoop” in transactions it
initiates that do not require hardware enforced cache coherency. Host
bridges that do not support isochronous VCs or can meet deadlines with
hardware enforced coherency may always enforce coherency. Enforcing cache
coherency is always conservative and will not lead to data corruption. |
|
CCS_050 |
The host bridge MUST NOT violate the coherence physical memory attribute if the “No Snoop” attribute in the TLP is 0. |
CCS_060 |
The interpretation of the TLP processing hints (TPH) by the SoC is
|
A future extension of the RISC-V IOMMU specification may define a standard interpretation of the TPH including the use of ATS memory attributes (AMA) for performing cache management. |
|
2.1.4.7. Message signaled interrupts
A message signaled interrupt (MSI or MSI-X) is the preferred interrupt signaling mechanism in PCIe.
| ID# | Rule |
|---|---|
MSI_010 |
Message Signaled Interrupts MUST be supported. |
MSI_020 |
SoC MUST NOT support INTx virtual wire based interrupt signaling. |
PCIe supports INTx emulation to support legacy PCI interrupt mechanisms. Modern SoC and devices are not expected be limited by the lack of this emulation mode. |
|
2.1.4.8. Precision Time Measurement (PTM)
| ID# | Rule |
|---|---|
PTM_010 |
PCIe root ports MAY support PCIe PTM capability. |
Several applications such as instrumentation, media servers, telecom servers, etc. require high precision monitoring and tracking of time. The PCIe PTM protocol supports synchronization of multiple devices/functions to a common shared PTM master time provided by the PTM root. |
|
PTM_020 |
When PCIe PTM capability is supported, the SoC MUST make the PTM master time available to the operating system. |
The mechanism to make the master time available to the operating system
is implementation specific. |
|
PTM_030 |
When PCIe PTM capability is supported, the PTM master time MUST be 64-bit wide. |
PTM_040 |
When PCIe PTM capability is supported, the PTM master time MUST use
the same or higher resolution clock than the clock used to increment
|
2.1.4.9. Error/Event Reporting
| ID# | Rule |
|---|---|
AER_010 |
PCIe root ports MUST support advanced error reporting (AER) capability for reporting errors from connected devices or the errors detected by the root port itself. |
AER capability defines more robust error reporting as compared to the baseline error reporting capability. |
|
AER_020 |
PCIe root ports MUST support the downstream port containment (DPC) capability. |
AER_030 |
PCIe root ports MUST support the RP PIO controls. |
The root port programmed I/O (PIO) controls enable fine-grained control over handling of non-posted requests that encounter errors and allows handling of such errors as either uncorrectable or advisory based on policies established by the operating system. |
|
AER_040 |
A RCiEP in the SoC SHOULD support the AER capability if it detects any of the errors defined by PCIe specification 6.0 (See section 6.2.7). |
AER_050 |
A RCiEP in the SoC MUST support the AER capability if it supports the ACS capability. |
AER_060 |
SoC MUST implement one or more PCIe RCEC in the root complex if any of the RCiEP implement the AER capability or implement PME signaling. |
AER_070 |
The PCIe RCEC implemented in a SoC MUST implement the RCEC endpoint association extended capability. |
AER_080 |
PCIe root port configuration registers MUST NOT be affected, except as required to update status associated with the transition to DL_Down (see also section 2.9.1 of PCIe specification 6.0). |
Retaining port configurations on transition to DL_Down state is important to support hot-plug. |
|
2.1.4.10. Vendor Specific Registers
| ID# | Rule |
|---|---|
VSR_010 |
Vendor specific registers in the root ports, host bridge, RCiEP, and RCRB MUST be implemented using one or more of the following capabilities:
|
VSR_020 |
SoC MUST NOT require hypervisor and/or operating system interaction with PCIe configuration space registers that are not defined by an industry standard. Non-standard vendor specific registers, if implemented in the PCIe configuration space, must only be used by the SoC firmware. |
Some industry standards such a CXL may define standard DVSEC structures in
the PCIe configuration space. |
|
2.1.4.11. SoC-Integrated PCIe Devices
| ID# | Rule |
|---|---|
SID_010 |
SoC-integrated PCIe devices MUST implement all software visible rules defined by the PCIe specification 6.0 for an EP or RCiEP as applicable. |
Implementing integrated devices as RCiEP or EP allows the use of standardized frameworks for memory and interrupt resource allocation, virtualization (SR-IOV), ATS/PRI, shared virtual addressing, trusted IO using SPDM/TDISP, participate in RAS frameworks like data poisoning and AER, power management, etc. |
|
SID_020 |
SoC-integrated PCIe devices MUST NOT require the use of I/O space or I/O transactions. |
SID_030 |
SoC integrated PCIe devices that cache address translations MUST implement the PCIe ATS capability if the address translation cache needs management by the operating system or hypervisors. |
SID_040 |
SoC-integrated PCIe devices that support PCIe SR-IOV capability SHOULD support the MSI-X capability. |
MSI-X capability enables virtual machines to assign interrupt resources to virtual functions without needing access to the configuration space of the function. Access to the configuration space of the virtual function is usually mediated by the hypervisor. |
|
SID_050 |
SoC-integrated PCIe devices MAY support the PASID capability. When PASID capability is supported, the devices SHOULD support a 20-bit wide PASID. |
Endpoints are recommended to support a 20-bit wide PASID to ensure interoperability with system software. See also the implementation note on PASID width homogeneity in the PCIe specification 6.0 section 6.20.2.2. |
|
SID_060 |
SoC-integrated PCIe devices (a multi-function device or an SR-IOV capable device) that support P2P traffic among functions (including among SR-IOV virtual functions) of the device MUST support the following PCIe ACS controls:
|
SID_070 |
If the BAR registers are implemented by SoC-integrated PCIe devices then they MUST be programmable. The Memory Space Indicator (bit 0) of such BAR registers MUST be 1, and they SHOULD support being mapped anywhere in the 64-bit memory space. |
SID_080 |
RCiEP MAY support the PCIe enhanced allocation (EA) capability for fixed allocation of memory resources. If EA capability is used then the BEI of the entries MUST be one of 0 through 5 or 9 through 14 and their primary/secondary properties must be one of 0 through 4 or 0xFF. |
SID_090 |
SoC-integrated PCIe devices MUST support the PCIe defined baseline error reporting capability and MAY support PCIe Advanced Error Reporting capability. If PCIe ACS controls are supported then the PCIe Advanced Error Reporting capability MUST be supported. |
See PCIe specification 6.0 section 7.5.1.1.14. |
|
SID_100 |
A RCiEP that supports PCIe Advanced Error Reporting MUST be associated with a Root Complex Event Collector. |
2.1.5. Reliability, Availability, and Serviceability (RAS)
| ID# | Rule |
|---|---|
RAS_010 |
The level of RAS implemented by the SoC is |
The level of RAS implemented by an SoC depends on the reliability goals
established for the SoC, which are commonly measured using metrics such as
failure-in-time (FIT) and defects-per-million (DPM). Achieving these goals
requires a combination of fault prevention, error detection, and error
correction techniques. |
|
RAS_020 |
SoC SHOULD support the generation, storage, and forwarding of
poisoned data. The granularity at which data is poisoned is
|
When an uncorrected data error is detected by a component, it might allow
potentially corrupted data to reach the data requester, but with an
associated poison indicator. These errors are referred to as uncorrected
deferred errors (UDE), as they enable the detecting component to continue
functioning and postpone addressing the error until a later time, assuming
the poisoned data gets consumed. If a component (such as a hart, an IOMMU, a
device, etc.) consumes the poisoned data, it triggers an uncorrected urgent
error (UUE), leading to the invocation of a recovery handler for immediate
remedial actions, as further deferral of the error is not feasible. |
|
RAS_030 |
If poisoned data needs to be transmitted from a first component to a second component that lacks the ability to manage poison, the first component MUST trigger an critical uncorrected error report instead of silently transmitting the corrupted data. |
Some components serve as intermediaries through which data passes. For instance, a PCIe/CXL port acts as an intermediary that receives data from memory but doesn’t consume it; rather, it forwards the data to an endpoint. In such cases, the intermediary component might encounter poisoned data. While this component can propagate the error and avoid logging an error, a different scenario arises when the destination component (such as a PCIe endpoint) cannot handle poison. In such situations, the originating component must trigger an urgent error signal instead of transmitting the poisoned data without the associated poison indicator. Failing to do so would breach the containment of the corrupted data during propagation. |
|
RAS_040 |
The SoC SHOULD support the RISC-V RAS error record register interface (RERI) [11] for error logging and signaling. |
RAS_050 |
When RERI is supported, the RAS error records MUST include the capability to individually enable error signaling for each severity - Uncorrected Error Critical (UEC), Uncorrected Error Deferred (UED), and Corrected Error (CE) - of error that could be logged in that specific error record. |
Configurable enables provide software with the flexibility of using an event-based or polling-based error logging for both corrected errors and deferred errors. Typically, software operates in an event-based mode for critical errors, as these errors necessiate immediate remedial action when they arise. |
|
RAS_060 |
If RERI is supported, RAS error records MUST preserve the
state of logged error information (including status, address,
information, supplemental information, and timestamp) across a
RAS-initiated reset. The state of RAS error records MAY persist
across other types of implementation-defined resets. After a reset,
including a RAS-initiated reset, the state of the control register
in the RAS error record is considered |
Some errors may lead a hardware component to enter a failure mode in which it becomes incapable of servicing additional requests- colloquially termed 'jammed' or 'wedged'. In these situations, the SoC may require a reset to restore it to an operational state (a RAS-initiated reset). Preserving the RAS error records through such resets enables the SoC firmware and system software to retrieve these error records during boot following such a reset, facilitating logging and analysis. |
|
RAS_070 |
If RERI is supported, the RAS error records MAY support error record injection, which is intended to facilitate RAS handler verification. |
Verifying the correct implementation of RAS handlers presents a formidable
challenge, given the impracticality of deterministically inducing all
potential errors within the SoC to validate the RAS handler’s adherence to
desired recovery protocols. An unverified RAS handler can lead to undesired
behavior during error occurrences, potentially reducing SoC availability or
affecting its serviceability. |
|
RAS_080 |
If RERI is supported, then the hardware components in the SoC that support error correction MUST incorporate a corrected error counter within their respective error records. Additionally, these components MUST support the signaling of counter overflows. |
Counting corrected errors offers a more precise assessment of system
reliability. Enabling signaling upon counter overflow empowers software to
define a suitable threshold for logging and analysis of these corrected
errors. |
|
2.1.6. Quality of Service
Quality of Service (QoS) refers to the minimum end-to-end performance that a service level agreement (SLA) guarantees to an application in advance. QoS capabilities within the SoC offer mechanisms that system software can leverage to manage interference to an application, effectively diminishing performance variability caused by other applications' utilization of shared resources such as cache capacity, memory bandwidth, interconnect bandwidth, power consumption, and more.
| ID# | Rule |
|---|---|
QOS_010 |
The SoC SHOULD incorporate QoS mechanisms to mitigate unwarranted performance interference that arises when multiple workloads access shared resources like caches and system memory. |
QOS_020 |
The SoC SHOULD integrate support for the RISC-V capacity and bandwidth controller register interface (CBQRI) [12] in significant shared caches and the memory controllers. |
QOS_030 |
If CBQRI is supported, RISC-V harts within the application
processors of the SoC MUST include support for the |
The |
|
QOS_040 |
If CBQRI is supported, the IOMMUs in the SoC SHOULD incorporate support for the CBQRI-defined extension, enabling the association of RCID and MCID with requests initiated by devices and the IOMMU. |
QOS_050 |
If CBQRI is supported, significant caches such as the last-level cache in the SoC SHOULD support cache capacity allocation. |
QOS_060 |
If CBQRI is supported, significant caches such as the last-level cache in the SoC SHOULD incorporate support for monitoring cache capacity usage. |
QOS_070 |
If CBQRI is supported, the memory controllers within the SoC SHOULD include support for bandwidth allocation. |
QOS_080 |
If CBQRI is supported, the memory controllers in the SoC SHOULD include support for monitoring bandwidth usage. |
The method employed by the SoC for bandwidth throttling and control is specific to its implementation. It is advisable for the implementation to utilize a scheme that results in a deviation of no more than +/- 10 % from the target set by system software through the CBQRI interface. |
|
QOS_090 |
If CBQRI is supported, the count of RCID and MCID supported by capacity controllers, bandwidth controllers, and all RISC-V application processor harts in the SoC MUST be consistent. |
Portable system software could opt to limit itself to accommodating the minimum count of RCID and MCID across the controllers. This approach avoids the complexity of dealing with unequal numbers of RCID and MCID across controllers, which would otherwise necessitate intricate allocations and constraints on workload placement. |
|
QOS_100 |
If CBQRI is supported, the monitoring counters in the capacity and bandwidth controllers MUST be sufficiently wide to not overflow when sampled at a rate of 1 Hz. |
As an illustration, consider an HBM3 memory interface that can facilitate data transfers at a rate of up to 1 TB/s. This scenario would necessitate a 34-bit counter to prevent overflow when sampled at a frequency of 1 Hz. |
|
2.1.7. Manageability
This section outlines the guidelines for RISC-V server SoCs to incorporate a standardized set of protocols and standards for server management. The SoC interfaces with a baseboard management controller (BMC) through in-band and out-of-band (OOB) management agents. The in-band management agents execute on the RISC-V application processor harts and the out-of-band management agents execute on a management controller in the SoC.
The out-of-band management interface facilitates the monitoring of sensors (e.g., temperature, power, etc.), parameter control (e.g., power limits, etc.), and logging (e.g., RAS error records, etc.) by the BMC without participation of software on the application processor harts. The in-band management interface facilitates system configuration (e.g., boot order, memory domains, secure boot, network, etc.), and event log collection through management agents in the OS and/or firmware that executes on the application processor harts.
This specification strongly recommends the use of the DMTF Redfish [14], DMTF Platform Level Data Model (PLDM) [15], and DMTF Management Component Transport Protocol (MCTP) [16]) protocols for in-band and out-of-band server management.
This specification strongly recommends the use of DMTF specified Security Protocol and Data Model (SPDM) [17] for device attestation and using SPDM encrypted messages [18] for secure in-band and out-of-band communication with the BMC. SPDM authentication protocols support establishing a trust relationship between the manageability agents in the SoC and the BMC. Use of SPDM secured messages enables preserving the confidentiality and integrity of data exchanged between the BMC and the manageability agents in the SoC.
The specification recommends supporting Intelligent Platform Management Interface (IPMI) [19] due to the widespread use of this protocol for server management functions such as credentials provisioning and remote power control.
This specification recommends the RISC-V server SoC to support open standards for server management through supporting integration with technologies such as the datacenter-ready secure control module (DC-SCM) [20] specified by the Open Compute Project for server management, security, and control features.
Adhering to the industry standard management protocols such as those specified by DMTF and OCP allows server platforms built with RISC-V server SoCs to seamlessly integrate into the server management frameworks and tools employed by data centers and enterprises.
| ID# | Rule |
|---|---|
MNG_010 |
The SoC SHOULD incorporate support for an x1 PCIe lane, preferably Gen 5, but at least Gen 3, to establish a connection with the BMC. |
This interface is commonly linked to a BMC as a PCIe endpoint, serving
various purposes. These include facilitating host-to-BMC communication for
tasks like video output (e.g., remote KVM support), MCTP transport over
PCIe VDM, and hosting a USB controller. The BMC might also support remote
presence capabilities, like remote media redirection and support for
keyboard and mouse functions through virtual USB. |
|
MNG_020 |
The SoC SHOULD support the use of I2C based IPMI SSIF for in-band management agents in the SoC to communicate with the BMC. |
MNG_030 |
The SoC SHOULD incorporate support for utilizing a UART connection to the BMC, enabling the provision of a host debug console. |
2.1.8. Performance Monitoring
| ID# | Rule |
|---|---|
SPM_010 |
Significant caches within the SoC SHOULD incorporate an HPM capable of counting:
|
It is recommended that a cache with a capacity that is approximately 16 KiB or larger be considered a significant cache. |
|
SPM_020 |
The memory controllers within the SoC SHOULD incorporate an HPM capable of counting:
|
SPM_030 |
The PCIe ports within the SoC SHOULD incorporate an HPM capable of counting:
|
SPM_040 |
The SoC SHOULD incorporate an HPM capable of counting the average latency of a read request from a memory requester (e.g., a hart, a PCIe host bridge, etc.) in the SoC. |
Bandwidth and latency are the most commonly used performance metrics to guide workload placement and tuning. |
|
SPM_050 |
If the SoC supports NUMA configurations, then the HPM for SPM_010, SPM_020, SPM_030, and SPM_040 SHOULD support filtering the counting based on whether the request is to local memory or to remote memory. |
SPM_060 |
All PCIe Gen6 ports within the SoC SHOULD incorporate support for the Flit performance measurement extended capability defined by PCIe specification 6.0. |
Please refer to Input-Output Memory Management Unit (IOMMU) for details on the IOMMU performance monitoring rules.
2.1.9. Security Requirements
| ID# | Rule |
|---|---|
SEC_010 |
The Server SoC MUST implement a hardware RoT as the primary root of trust. |
A root of trust (RoT) is the foundation on which all secure operations of a system depend. A hardware RoT is a dedicated and possibly isolated trusted subsystem that can provide stronger protections against physical and logical attacks. |
|
SEC_020 |
The PCIe root ports within the SoC SHOULD support PCIe Integrity and Data Encryption (IDE) capability. |
The IDE extension adds optional capabilities to perform hardware encryption and integrity checks on packets transferred across PCIe links. This addition provides confidentiality, integrity, and replay protection against hardware-level attacks. |
|
SEC_030 |
The SoC SHOULD support encryption of off-chip DRAM using a transient memory encryption key that has at least 256-bit key lengths. |
Off-chip memory encryption provides protection to critical assets in memory such as credentials, data encryption keys, and other secrets. |
|
SEC_040 |
The cryptographic modules used to implement PCIe and off-chip DRAM encryption SHOULD comply with security requirements specified by relevant security standards from national standards laboratories. |
FIPS 140-3 is an example of such a standard |
|
SEC_050 |
The SoC SHOULD have the capability of interfacing with a Trusted Platform Module (TPM) that adheres to the TPM 2.0 Library specification [21]. |
A TPM enhances security by providing secure storage for sensitive information such as credentials and passwords, cryptographic operations and protection against tampering or unauthorized access. |
|