BMWG Working Group F. Calabria Internet-Draft Cisco Intended status: Informational C. Pignataro Expires: 30 August 2026 Blue Fern Consulting Q. Wu G. Fioccola Huawei 26 February 2026 Benchmarking Terminology for AI Network Fabrics draft-calabria-bmwg-ai-fabric-terminology-00 Abstract This document defines benchmarking terminology for evaluating Ethernet-based network fabrics used in distributed Artificial Intelligence (AI) training and inference workloads. It provides a unified vocabulary consolidating and extending terms from RFC 1242, RFC 8238, and the companion AI fabric methodology documents, establishing precise, vendor-neutral definitions for collective communication primitives, RDMA transport mechanisms (RoCEv2 and Ultra Ethernet Transport), congestion control behaviors, AI-specific Key Performance Indicators (KPIs), and fabric topology concepts. This document is a companion to draft-bmwg-ai-fabric-training- bench-00 and draft-bmwg-ai-fabric-inference-bench-00. Those documents SHOULD NOT be applied without first consulting the terminology defined herein. Where definitions herein overlap with RFC 1242 or RFC 8238, the AI fabric context definition in this document takes precedence. About This Document This note is to be removed before publishing as an RFC. The latest revision of this draft can be found at https://fcalabri.github.io/bmwg-ai-fabric-terminology/draft-calabria- bmwg-ai-fabric-terminology-00.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft- calabria-bmwg-ai-fabric-terminology/. Source for this draft and an issue tracker can be found at https://github.com/fcalabri/bmwg-ai-fabric-terminology. Calabria, et al. Expires 30 August 2026 [Page 1] Internet-Draft AI Fabric Benchmarking Terminology February 2026 Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 30 August 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 1.2. Scope and Purpose . . . . . . . . . . . . . . . . . . . . 3 1.3. Relationship to Existing BMWG Work . . . . . . . . . . . 3 1.4. Relationship to Companion Documents . . . . . . . . . . . 4 2. General Benchmarking Terms . . . . . . . . . . . . . . . . . 4 3. Collective Communication Terms . . . . . . . . . . . . . . . 6 4. Distributed Parallelism Strategy Terms . . . . . . . . . . . 8 5. Network Transport Terms . . . . . . . . . . . . . . . . . . . 9 5.1. RoCEv2 and RDMA Terms . . . . . . . . . . . . . . . . . . 9 5.2. Ultra Ethernet Transport (UET) Terms . . . . . . . . . . 11 5.2.1. UET Transport Services Comparison . . . . . . . . . . 12 6. Congestion Control and Fabric Behavior Terms . . . . . . . . 13 6.1. Load Balancing Strategy Comparison . . . . . . . . . . . 15 7. Fabric Topology and Infrastructure Terms . . . . . . . . . . 16 Calabria, et al. Expires 30 August 2026 [Page 2] Internet-Draft AI Fabric Benchmarking Terminology February 2026 8. Training-Specific Terms . . . . . . . . . . . . . . . . . . . 18 9. Inference-Specific Terms . . . . . . . . . . . . . . . . . . 19 9.1. Inference Phase Characteristics . . . . . . . . . . . . . 22 10. KPI Classification Terms . . . . . . . . . . . . . . . . . . 22 10.1. KPI Tier Summary . . . . . . . . . . . . . . . . . . . . 24 11. Referenced Standards Abbreviations . . . . . . . . . . . . . 24 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 26 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Normative References . . . . . . . . . . . . . . . . . . . . . 26 Informative References . . . . . . . . . . . . . . . . . . . . 26 Appendix A: Term Cross-Reference to Companion Documents . . . . . 27 Appendix B: Term Taxonomy Summary . . . . . . . . . . . . . . . . 28 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 1. Introduction 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 1.2. Scope and Purpose This document defines terminology specifically for benchmarking Ethernet-based AI network fabrics in controlled laboratory environments. The defined terms cover: distributed AI training collective communication patterns, LLM inference serving architectures, RDMA transport semantics (RoCEv2 and UET), congestion control mechanisms, fabric topology characteristics, and performance metric definitions. This document does not define acceptance criteria, performance requirements, or configuration recommendations. It does not address benchmarking of live operational networks, intra-node (NVLink/PCIe) interconnects, or storage networking. 1.3. Relationship to Existing BMWG Work This document extends the foundational BMWG terminology established in [RFC1242] (network interconnect benchmarking terminology) and [RFC8238] (data center benchmarking terminology). Where terms are defined in those RFCs, this document provides AI fabric context extensions; the core definitions remain as established. This document also extends the test methodology framework of [RFC2544] and [RFC8239] as applied in the companion AI fabric methodology Calabria, et al. Expires 30 August 2026 [Page 3] Internet-Draft AI Fabric Benchmarking Terminology February 2026 documents. 1.4. Relationship to Companion Documents This document is one of three companion Internet-Drafts addressing AI fabric benchmarking: * draft-bmwg-ai-fabric-terminology-00 (this document): Terminology definitions. * [TRAINING-BENCH]: Benchmarking methodology for AI training workloads. * [INFERENCE-BENCH]: Benchmarking methodology for AI inference serving workloads. Implementers and evaluators SHOULD read this terminology document before applying the companion methodology documents. Terms defined here are used normatively in those documents and are not redefined there unless the specific workload context introduces a substantive difference, which is noted explicitly. 2. General Benchmarking Terms The following terms establish the general measurement framework applicable to all AI fabric benchmarking activities. +=============+====================================================+ | Term | Definition | +=============+====================================================+ | *AI Fabric* | The dedicated Ethernet backend network | | | interconnecting accelerators (GPUs/XPUs) for | | | distributed AI training and inference workloads. | | | Typically implemented as a non-blocking Clos (fat- | | | tree) topology running RoCEv2 or UET transport. | | | Distinct from the front-end (management/storage) | | | network. | +-------------+----------------------------------------------------+ | *DUT* | Device Under Test. The network element(s) whose | | | performance characteristics are being measured. | | | In AI fabric benchmarking the DUT is one or more | | | fabric elements: leaf switches, spine switches, | | | NICs, or the complete fabric assembly. | +-------------+----------------------------------------------------+ | *SUT* | System Under Test. The complete AI compute system | | | including accelerators, NICs, the fabric DUT, and | | | serving/training software, when end-to-end metrics | | | are the measurement objective. | Calabria, et al. Expires 30 August 2026 [Page 4] Internet-Draft AI Fabric Benchmarking Terminology February 2026 +-------------+----------------------------------------------------+ | *RT* | Router Tester / Traffic Generator. Test equipment | | | capable of generating and receiving network | | | traffic at specified rates with nanosecond- | | | resolution timestamping sufficient for the | | | measurements defined in the companion methodology | | | documents. | +-------------+----------------------------------------------------+ | *JFI* | Jain's Fairness Index. A scalar measure of flow- | | | level throughput fairness across n flows: JFI = | | | (Σxᵢ)² / (n · Σxᵢ²) where xᵢ is the throughput of | | | flow i. A value of 1.0 indicates perfect | | | fairness; lower values indicate disparity. | | | *SHOULD* be computed per [RFC1242] reporting | | | conventions. | +-------------+----------------------------------------------------+ | *Offered | The total traffic rate presented to the DUT from | | Load* | test equipment, expressed as a fraction of line | | | rate (0–100%) or as absolute bit/s. Offered load | | | is controlled independently of DUT absorption, | | | enabling characterization of saturation behavior. | +-------------+----------------------------------------------------+ | *Trial | The time interval over which a single measurement | | Duration* | is conducted. For AI fabric tests, the | | | *RECOMMENDED* minimum is 60 seconds for throughput | | | tests and 300 seconds for soak/stability tests, | | | per the methodology in [RFC2544] as extended | | | herein. | +-------------+----------------------------------------------------+ | *Warmup | A mandatory pre-measurement interval during which | | Period* | traffic is sent but results are not recorded. | | | Ensures adaptive routing tables, PFC watermarks, | | | and DCQCN/UET congestion controllers reach steady | | | state before measurement begins. *RECOMMENDED* | | | minimum: 10 seconds. | +-------------+----------------------------------------------------+ | *Binary | An iterative test procedure for determining the | | Search* | maximum offered load at which a DUT meets a | | | specified acceptance criterion (e.g., zero packet | | | loss). The search halves the candidate load range | | | at each iteration, converging to a resolution of | | | 0.1% offered load within 10 iterations. | +-------------+----------------------------------------------------+ | *Percentile | A latency statistic expressing that the specified | | Latency* | fraction of all measured latency samples fall at | | | or below the reported value. Denoted Pxx (e.g., | | | P50, P95, P99, P99.9). Tail latency (P99 and | | | above) is especially relevant for AI fabric | Calabria, et al. Expires 30 August 2026 [Page 5] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | benchmarking because SLO violations are determined | | | by worst-case, not median, performance. | +-------------+----------------------------------------------------+ Table 1: General Benchmarking Terms 3. Collective Communication Terms The following terms define the collective communication operations that are the primary traffic sources in distributed AI workloads. +=================+================================================+ | Term | Definition | +=================+================================================+ | *Collective | A coordinated communication pattern executed | | Operation* | simultaneously across all accelerators in a | | | training or inference group. Core | | | collectives: AllReduce (gradient aggregation), | | | AllGather (parameter distribution), | | | ReduceScatter (partial reduction + scatter), | | | and AllToAll (expert dispatch in MoE models). | +-----------------+------------------------------------------------+ | *AllReduce* | A collective in which each participant | | | contributes a tensor and all participants | | | receive the element-wise sum (or other | | | reduction) of all contributions. The dominant | | | communication primitive in data-parallel and | | | tensor-parallel training. BusBW is the | | | primary KPI. | +-----------------+------------------------------------------------+ | *AllGather* | A collective in which each participant | | | contributes a shard of a tensor and all | | | participants receive the concatenation of all | | | shards. Used in tensor-parallel (Megatron- | | | style) layers to reconstruct distributed | | | activations or parameters. | +-----------------+------------------------------------------------+ | *ReduceScatter* | A collective combining an element-wise | | | reduction with a scatter, so each participant | | | receives a distinct slice of the reduced | | | result. Used in ZeRO-stage optimizer | | | strategies and as the first half of a ring- | | | AllReduce. | +-----------------+------------------------------------------------+ | *AllToAll* | A collective in which each participant sends a | | | distinct payload to every other participant | | | and receives a distinct payload from every | | | other participant. The critical collective | Calabria, et al. Expires 30 August 2026 [Page 6] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | for Mixture-of-Experts token dispatch. | | | Generates N²−1 independent point-to-point | | | flows for N participants. | +-----------------+------------------------------------------------+ | *Ring | An AllReduce (or AllGather/ReduceScatter) | | Algorithm* | algorithm structured as a logical ring of | | | participants. Each participant sends to its | | | right neighbor and receives from its left | | | neighbor in 2(N−1) steps. Bus bandwidth | | | efficiency = 2(N−1)/N, approaching 100% for | | | large N. Standard baseline for BusBW | | | calculation. | +-----------------+------------------------------------------------+ | *BusBW* | Bus Bandwidth. Effective per-accelerator | | | throughput during a collective, normalizing | | | for algorithm overhead: BusBW = (data_size × | | | algo_factor) / elapsed_time. For ring | | | AllReduce, algo_factor = 2(N−1)/N. Enables | | | comparison across cluster sizes and collective | | | algorithms. | +-----------------+------------------------------------------------+ | *CCL* | Collective Communication Library. A software | | | library providing optimized implementations of | | | collective operations (AllReduce, AllGather, | | | etc.) over a specific transport. The CCL | | | implementation *MUST* be documented in the | | | test report. | +-----------------+------------------------------------------------+ | *SPMD* | Single Program Multiple Data. The execution | | | model underlying bulk-synchronous distributed | | | training, in which all accelerators execute | | | identical computation on distinct data | | | partitions, synchronizing at collective | | | barriers between steps. | +-----------------+------------------------------------------------+ | *Bulk | A distributed computation model structured as | | Synchronous | alternating compute and communicate phases | | Parallel (BSP)* | with a global synchronization barrier between | | | phases. Standard training workloads follow | | | BSP: forward pass → backward pass → AllReduce | | | gradient sync → optimizer step. | +-----------------+------------------------------------------------+ Table 2: Collective Communication Terms Calabria, et al. Expires 30 August 2026 [Page 7] Internet-Draft AI Fabric Benchmarking Terminology February 2026 4. Distributed Parallelism Strategy Terms The following terms define the parallelism strategies used in distributed AI model training and inference, which determine traffic patterns and fabric requirements. +=============+==================================================+ | Term | Definition | +=============+==================================================+ | *Data | A distributed training strategy replicating the | | Parallelism | full model on each accelerator, partitioning the | | (DP)* | training dataset across replicas. Gradient | | | synchronization after each backward pass | | | requires an AllReduce across all DP ranks. | | | Memory-efficient for small models; communication | | | overhead scales with parameter count. | +-------------+--------------------------------------------------+ | *Tensor | A distributed training and inference strategy | | Parallelism | partitioning individual weight matrices across | | (TP)* | multiple accelerators. Each rank computes a | | | partial result; AllGather or ReduceScatter | | | collectives are required within each layer to | | | aggregate results. Dominant parallelism within | | | a node (intra-node). | +-------------+--------------------------------------------------+ | *Pipeline | A distributed strategy assigning contiguous | | Parallelism | groups of transformer layers to distinct stages | | (PP)* | (accelerators or nodes). Each stage processes | | | one microbatch and forwards activations to the | | | next stage. Generates point-to-point inter- | | | stage traffic across the fabric (activations and | | | gradients). | +-------------+--------------------------------------------------+ | *Expert | A parallelism strategy for Mixture-of-Experts | | Parallelism | models distributing expert sub-networks across | | (EP)* | accelerators. Each token is routed to its | | | designated experts (typically top-K of E total | | | experts), requiring AllToAll communication for | | | dispatch. Wide EP (e.g., 96-way) generates | | | dense inter-node AllToAll at every MoE layer. | +-------------+--------------------------------------------------+ | *MoE* | Mixture of Experts. A transformer architecture | | | replacing dense feed-forward layers with a set | | | of E expert sub-networks, of which only top-K | | | experts (typically K=2 or K=4) are activated per | | | token via a learned router. MoE enables large | | | model capacity with sub-linear compute, but | | | introduces AllToAll communication requirements | Calabria, et al. Expires 30 August 2026 [Page 8] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | proportional to E and sequence length. | +-------------+--------------------------------------------------+ | *DP | Data Parallelism applied to the attention | | Attention* | computation, where the KV cache is partitioned | | | across data-parallel ranks. Each rank holds 1/ | | | DP_SIZE of the KV cache; AllToAll communication | | | exchanges attention outputs. Used in inference | | | to reduce per-accelerator memory footprint for | | | long contexts. | +-------------+--------------------------------------------------+ | *ZeRO* | Zero Redundancy Optimizer. A memory | | | optimization strategy for data-parallel training | | | that shards model states (parameters, gradients, | | | optimizer states) across DP ranks instead of | | | replicating them. Stage 1 shards optimizer | | | states; Stage 2 adds gradient sharding; Stage 3 | | | adds parameter sharding. Each stage increases | | | AllGather/ReduceScatter communication. | +-------------+--------------------------------------------------+ Table 3: Distributed Parallelism Strategy Terms 5. Network Transport Terms 5.1. RoCEv2 and RDMA Terms The following terms define RDMA and RoCEv2 transport semantics as used in AI fabric benchmarking. +===========+====================================================+ | Term | Definition | +===========+====================================================+ | *RDMA* | Remote Direct Memory Access. A transport | | | mechanism enabling direct memory-to-memory data | | | transfer between hosts without involving the | | | destination CPU, providing zero-copy semantics and | | | kernel bypass. Implementations include InfiniBand | | | Verbs (native IB), iWARP (RDMA over TCP), and | | | RoCEv2 (RDMA over Converged Ethernet v2). | +-----------+----------------------------------------------------+ | *RoCEv2* | RDMA over Converged Ethernet version 2. An RDMA | | | transport encapsulating InfiniBand transport layer | | | (BTH) over UDP/IP, enabling RDMA semantics on | | | standard Ethernet infrastructure. Requires | | | lossless fabric operation (PFC or equivalent) for | | | correctness. Standardized in IBTA Annex 16; | | | transported over UDP destination port 4791. | +-----------+----------------------------------------------------+ Calabria, et al. Expires 30 August 2026 [Page 9] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | *QP* | Queue Pair. The fundamental RDMA communication | | | endpoint comprising a Send Queue (SQ) and Receive | | | Queue (RQ). QPs are connection-oriented in | | | Reliable Connected (RC) mode. Multiple QPs per | | | source-destination pair are used to increase ECMP | | | entropy in fabric load balancing. | +-----------+----------------------------------------------------+ | *Reliable | An RDMA QP transport service type providing | | Connected | reliable, in-order delivery between exactly two | | (RC)* | endpoints. The primary QP type for AI collective | | | operations via RoCEv2. Requires connection setup | | | before data transfer and maintains per-QP state | | | for retransmission. | +-----------+----------------------------------------------------+ | *RDMA | An operation primitive of the RDMA programming | | Verb* | model. Key verbs: SEND/RECV (two-sided, receiver | | | must post a buffer), WRITE (one-sided, target | | | memory written directly), READ (one-sided, remote | | | memory read), and Atomic (compare-and-swap, fetch- | | | and-add). AI collectives predominantly use WRITE | | | and SEND. | +-----------+----------------------------------------------------+ | *UET* | Ultra Ethernet Transport. A transport protocol | | | defined by the Ultra Ethernet Consortium (UEC) | | | Specification 1.0 as a next-generation AI/HPC | | | fabric transport. UET is connectionless, supports | | | native packet spraying (RUD), and integrates | | | multipath load balancing and congestion control. | | | Transported over UDP destination port 4793 | | | (pending IANA verification). | +-----------+----------------------------------------------------+ | *PDC* | Packet Delivery Context. The ephemeral, | | | lightweight transport endpoint in UET, analogous | | | to but distinct from an RDMA Queue Pair. PDCs are | | | connectionless (no setup handshake), enabling low- | | | latency initiation and reduced per-flow state in | | | the NIC and switch. | +-----------+----------------------------------------------------+ | *ROD* | Reliable Ordered Delivery. A UET transport | | | service providing reliable, in-order packet | | | delivery, semantically equivalent to RoCEv2 RC | | | mode. Suitable for legacy RDMA applications | | | requiring strict ordering guarantees. | +-----------+----------------------------------------------------+ Table 4: Distributed Parallelism Strategy Terms Calabria, et al. Expires 30 August 2026 [Page 10] Internet-Draft AI Fabric Benchmarking Terminology February 2026 5.2. Ultra Ethernet Transport (UET) Terms The following terms define UET-specific concepts introduced by the Ultra Ethernet Consortium (UEC) Specification 1.0 [UEC-SPEC-1.0]. +===========+=======================================================+ | Term | Definition | +===========+=======================================================+ | *RUD* | Reliable Unordered Delivery. A UET transport | | | service providing reliable delivery without | | | maintaining packet order across paths. Enables | | | native packet spraying across ECMP paths without | | | reorder-buffer overhead at the receiver NIC. The | | | preferred UET service class for AI training | | | collectives. | +-----------+-------------------------------------------------------+ | *RUDI* | Reliable Unordered Delivery for Idempotent | | | operations. A UET transport service optimized for | | | operations safe to execute more than once (e.g., | | | RDMA Writes to non-accumulating targets), allowing | | | simplified retransmission logic with reduced state | | | overhead. | +-----------+-------------------------------------------------------+ | *UUD* | Unreliable Unordered Delivery. A UET transport | | | service providing best-effort, unordered packet | | | delivery with minimal overhead. Suitable for | | | telemetry, speculative operations, or workloads | | | with application-layer loss tolerance. | +-----------+-------------------------------------------------------+ | *UEC | A defined subset of UET features targeting a | | Profile* | specific use case: AI Base (core AI training/ | | | inference, mandatory feature set), AI Full (AI Base | | | plus deferred send, exact-match tagging, extended | | | atomics), or HPC (latency-optimized for traditional | | | HPC workloads with fine-grained synchronization). | +-----------+-------------------------------------------------------+ | *LLR* | Link Layer Retry. An optional UEC link-layer | | | enhancement providing fast per-hop error recovery | | | at the Ethernet link layer. LLR detects symbol | | | errors at the FEC level and retransmits the | | | affected frame before it is dropped, reducing the | | | frequency of transport-layer retransmission and | | | improving tail latency. | +-----------+-------------------------------------------------------+ | *Packet | An optional UEC link-layer behavior in which a | | Trimming* | congested switch, rather than dropping the full | | | packet, transmits only the packet header (trimmed | | | packet) to the receiver. Trimming enables the | Calabria, et al. Expires 30 August 2026 [Page 11] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | receiver to detect loss and initiate selective | | | retransmission more rapidly, reducing bandwidth | | | waste versus silent drop. | +-----------+-------------------------------------------------------+ | *CBFC* | Credit-Based Flow Control. An optional UEC link- | | | layer buffer management mechanism using explicit | | | credit grants from downstream to upstream devices. | | | CBFC provides backpressure without transmitting PFC | | | PAUSE frames, eliminating the head-of-line blocking | | | and storm propagation risks associated with PFC. | +-----------+-------------------------------------------------------+ | *Entropy | A per-packet field in the UET header used to | | Value* | distribute packets of a single message across | | | available ECMP paths, providing explicit spray | | | entropy independent of the IP 5-tuple. Enables | | | hardware-assisted packet spraying without requiring | | | transport-layer state in the switch. | +-----------+-------------------------------------------------------+ | *GIN* | GPU-Initiated Networking. A communication paradigm | | | in which GPU threads directly initiate network RDMA | | | operations (sends, one-sided writes/reads) to the | | | NIC hardware without CPU involvement, eliminating | | | the CPU-GPU synchronization round-trip. Reduces | | | effective latency by several microseconds for fine- | | | grained operations. | +-----------+-------------------------------------------------------+ | *KVCXL* | KV Cache Transfer Library. A software library | | | providing standardized point-to-point data transfer | | | primitives (register, transfer, notify) for | | | inference engines, abstracting underlying transport | | | mechanisms (intra-node interconnect, RDMA, PCIe, | | | storage interfaces). Enables transport-agnostic KV | | | cache migration in disaggregated serving | | | architectures. | +-----------+-------------------------------------------------------+ Table 5: Ultra Ethernet Transport (UET) Terms 5.2.1. UET Transport Services Comparison +=========+=========+==========+================+================+ | Service | Ordered | Reliable | Retransmission | Primary Use | | | | | Complexity | Case | +=========+=========+==========+================+================+ | *ROD* | Yes | Yes | Full per-QP | Legacy RDMA / | | | | | state | ordered AI ops | +---------+---------+----------+----------------+----------------+ | *RUD* | No | Yes | Reduced | AI training | Calabria, et al. Expires 30 August 2026 [Page 12] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | | | (unordered) | collectives | | | | | | with spray | +---------+---------+----------+----------------+----------------+ | *RUDI* | No | Yes | Minimal | RDMA Writes; | | | | | (idempotent) | simple | | | | | | retransmit | +---------+---------+----------+----------------+----------------+ | *UUD* | No | No | None | Telemetry, | | | | | | speculative | | | | | | ops | +---------+---------+----------+----------------+----------------+ Table 6: UET Transport Services Comparison 6. Congestion Control and Fabric Behavior Terms The following terms define congestion management mechanisms and associated fabric behaviors critical to AI workload performance. +===========+=======================================================+ | Term | Definition | +===========+=======================================================+ | *PFC* | Priority Flow Control (IEEE 802.1Qbb). A | | | lossless Ethernet mechanism in which a receiver | | | transmits a PAUSE frame to its upstream | | | neighbor on a specific priority class when its | | | ingress buffer approaches a configured | | | threshold, temporarily halting transmission of | | | that priority. Required for lossless RoCEv2 | | | operation. PFC operates hop-by-hop and can | | | propagate congestion upstream (PFC storm risk). | +-----------+-------------------------------------------------------+ | *PFC | A pathological condition in which PFC PAUSE | | Storm* | frames propagate across multiple hops, causing | | | widespread throughput degradation or deadlock | | | unrelated to the original congestion source. | | | Detection and mitigation *SHOULD* be part of | | | soak test evaluation per the companion | | | methodology documents. | +-----------+-------------------------------------------------------+ | *PFC | A circular PFC dependency in which sets of | | Deadlock* | flows mutually pause each other indefinitely, | | | resulting in zero progress for affected traffic | | | classes. Deadlock risk is elevated in non-tree | | | topologies and *MUST* be evaluated in fabric- | | | level soak tests. | +-----------+-------------------------------------------------------+ | *ECN* | Explicit Congestion Notification ([RFC3168]). | Calabria, et al. Expires 30 August 2026 [Page 13] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | An IP-layer mechanism in which a congested | | | router marks packets with the Congestion | | | Experienced (CE) codepoint in the IP ECN field | | | instead of dropping them. The receiver echoes | | | congestion feedback to the sender via the | | | transport protocol, triggering rate reduction. | | | Used with RoCEv2 as part of DCQCN. | +-----------+-------------------------------------------------------+ | *DCQCN* | Data Center Quantized Congestion Notification. | | | An end-to-end congestion control algorithm for | | | RoCEv2 flows, combining ECN marking at | | | congested switches with rate-based sender | | | reduction using an AIMD scheme. Note: PFC | | | serves as a separate, orthogonal backstop to | | | prevent packet loss during DCQCN convergence; | | | PFC is *not* a component of the DCQCN algorithm | | | itself. | +-----------+-------------------------------------------------------+ | *ECN | The fraction of packets (expressed as a | | Marking | percentage) that are marked with the CE | | Ratio* | codepoint in the IP ECN field over a | | | measurement interval. A high ECN Marking Ratio | | | indicates persistent congestion and is a | | | primary Fabric Health Indicator. | +-----------+-------------------------------------------------------+ | *Incast* | A traffic pattern in which multiple sources | | | simultaneously send to a single destination, | | | potentially overwhelming the destination's NIC | | | receive buffer and the switch's egress port | | | buffer. Incast is a dominant congestion | | | mechanism in AllReduce and collective | | | operations. | +-----------+-------------------------------------------------------+ | *Incast | The ratio of concurrent senders to receivers in | | Ratio* | an incast communication pattern (N:1). The | | | incast ratio determines the oversubscription | | | factor at the destination port and is a primary | | | test parameter for congestion characterization. | +-----------+-------------------------------------------------------+ | *Packet | A load balancing strategy distributing | | Spray* | individual packets of a single RDMA message | | | across all available ECMP paths, maximizing | | | link utilization at the cost of potential out- | | | of-order delivery at the receiver. Native in | | | UET (RUD mode); requires NIC reorder buffering | | | for RoCEv2 RC mode. | +-----------+-------------------------------------------------------+ | *DLB / | Dynamic Load Balancing using flowlet detection. | Calabria, et al. Expires 30 August 2026 [Page 14] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | Flowlet* | A per-flow rerouting mechanism that reassigns a | | | flow to a new ECMP path when the flow has been | | | idle longer than the flowlet gap threshold | | | (typically 500 ns–2 µs), reducing out-of-order | | | packet risk compared to packet spray while | | | improving utilization over static per-flow | | | ECMP. | +-----------+-------------------------------------------------------+ | *ECMP* | Equal-Cost Multi-Path routing. A forwarding | | | mechanism distributing traffic across multiple | | | equal-cost paths, typically via hash of the IP | | | 5-tuple (or entropy field in UET). ECMP | | | imbalance (MMR > 1.0) is a primary fabric | | | efficiency metric for AI traffic. | +-----------+-------------------------------------------------------+ | *MMR* | Max-Mean Ratio. The ratio of the flow count | | | (or traffic load) on the most heavily utilized | | | link to the average flow count per link across | | | all fabric links. MMR = 1.0 indicates perfect | | | ECMP balance; MMR > 1.0 quantifies imbalance | | | that degrades effective fabric bandwidth. | +-----------+-------------------------------------------------------+ Table 7: Congestion Control and Fabric Behavior Terms 6.1. Load Balancing Strategy Comparison +============+=============+============+=============+============+ | Strategy | Granularity | Reorder | Utilization | Complexity | | | | Risk | | | +============+=============+============+=============+============+ | *ECMP | Per-flow | None | Low | Low | | (5-tuple | | | (elephant | | | hash)* | | | flow bias) | | +------------+-------------+------------+-------------+------------+ | *DLB / | Per-flowlet | Low | Medium | Medium | | Flowlet* | | | | | +------------+-------------+------------+-------------+------------+ | *Packet | Per-packet | High | High | High (NIC | | Spray | | | | reorder | | (RoCEv2)* | | | | buffer) | +------------+-------------+------------+-------------+------------+ | *Packet | Per-packet | None | High | Low | | Spray (UET | | (transport | | | | RUD)* | | tolerates | | | | | | OOO) | | | +------------+-------------+------------+-------------+------------+ Calabria, et al. Expires 30 August 2026 [Page 15] Internet-Draft AI Fabric Benchmarking Terminology February 2026 Table 8: Load Balancing Strategy Comparison 7. Fabric Topology and Infrastructure Terms The following terms define fabric topology architectures and infrastructure components referenced in the companion methodology documents. +===================+=============================================+ | Term | Definition | +===================+=============================================+ | *Clos / Fat-Tree | A multi-stage switch topology providing | | Topology* | non-blocking or oversubscribed connectivity | | | between all leaf-to-leaf pairs. In AI | | | fabric deployments, a two-tier (leaf-spine) | | | or three-tier (leaf-spine-superspine) Clos | | | is standard. Full bisection bandwidth | | | (1:1) is the target for training fabrics; | | | 2:1 or 4:1 oversubscription may be | | | acceptable for inference fabrics. | +-------------------+---------------------------------------------+ | *Rail-Optimized | A topology in which the NIC ports of each | | Topology* | server are distributed across multiple ToR | | | switches (one NIC port per switch), such | | | that collective traffic between adjacent | | | servers traverses different physical paths. | | | Minimizes switch-to-switch traffic during | | | ring AllReduce, maximizing effective BusBW. | | | Requires ECMP-aware collective placement. | +-------------------+---------------------------------------------+ | *Bisection | The aggregate bandwidth across the minimum | | Bandwidth* | cut that divides the fabric into two equal | | | halves. Non-blocking fabrics provide | | | bisection bandwidth equal to half the total | | | edge (server-facing) bandwidth. Limits | | | worst-case all-to-all communication | | | throughput. | +-------------------+---------------------------------------------+ | *Oversubscription | The ratio of total edge (server-facing) | | Ratio* | bandwidth to total bisection bandwidth in a | | | Clos fabric. A 1:1 ratio is non-blocking; | | | higher ratios (e.g., 2:1, 4:1) reduce | | | fabric cost but may bottleneck all-to-all | | | and AllReduce patterns when all server | | | ports are active simultaneously. | +-------------------+---------------------------------------------+ | *ToR Switch* | Top-of-Rack switch. The first-hop | | | aggregation switch connecting accelerator | Calabria, et al. Expires 30 August 2026 [Page 16] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | servers in a rack to the spine layer of the | | | fabric. In rail-optimized topologies, | | | multiple ToR switches serve a single rack, | | | with each server's NICs distributed across | | | ToRs. | +-------------------+---------------------------------------------+ | *Spine / | Intermediate and top-layer switches in a | | Superspine* | multi-tier Clos fabric, providing inter- | | | rack and inter-pod connectivity | | | respectively. Spine switches aggregate | | | multiple ToR switches; superspine switches | | | aggregate multiple spine pods. | +-------------------+---------------------------------------------+ | *NIC* | Network Interface Controller. The hardware | | | device providing network connectivity for | | | an accelerator host. AI fabric NICs | | | support RDMA (RoCEv2 or UET), hardware | | | offload for collective operations, and, | | | optionally, GPU-Initiated Networking (GIN). | | | NIC model and firmware version *MUST* be | | | documented in all benchmark reports. | +-------------------+---------------------------------------------+ | *Buffer | The instantaneous or time-averaged fill | | Occupancy* | level of a switch port's packet buffer, | | | expressed in bytes or as a fraction of | | | total buffer capacity. Elevated sustained | | | buffer occupancy indicates congestion. P99 | | | buffer occupancy is a Fabric Health | | | Indicator in the companion methodology | | | documents. | +-------------------+---------------------------------------------+ | *Zero-Impact | Sub-microsecond automatic path convergence | | Failover* | upon a link or switch failure resulting in | | | no measurable increase to JCT or TTFT. | | | Requires pre-programmed alternate paths and | | | hardware-level fast reroute (FRR) with sub- | | | microsecond detection, not relying on | | | routing protocol convergence. | +-------------------+---------------------------------------------+ | *Link | The fraction of the nominal link capacity | | Utilization* | actually used for data transmission over a | | | measurement interval, expressed as a | | | percentage. Reported as mean, P95, and P99 | | | per link. High asymmetric link utilization | | | (low average but high peak) is | | | characteristic of bursty AI inference | | | traffic. | +-------------------+---------------------------------------------+ Calabria, et al. Expires 30 August 2026 [Page 17] Internet-Draft AI Fabric Benchmarking Terminology February 2026 Table 9: Fabric Topology and Infrastructure Terms 8. Training-Specific Terms The following terms are specific to AI training workload benchmarking and are used normatively in [TRAINING-BENCH]. +==================+==============================================+ | Term | Definition | +==================+==============================================+ | *JCT* | Job Completion Time. The wall-clock elapsed | | | time from the start of a training job (or | | | benchmark iteration) until all participating | | | accelerators complete their work, inclusive | | | of all forward pass, backward pass, and | | | collective communication phases. JCT is the | | | primary end-to-end training efficiency KPI. | +------------------+----------------------------------------------+ | *Roofline JCT* | The theoretical minimum JCT assuming perfect | | | (zero-contention, zero-queuing) network | | | behavior: Roofline JCT = computation_time + | | | serialization_delay, where | | | serialization_delay = message_size / | | | link_rate. Provides a reference for | | | evaluating fabric overhead. | +------------------+----------------------------------------------+ | *JCT Ratio* | The ratio of measured JCT to Roofline JCT. | | | A value of 1.0 indicates no network-induced | | | overhead. Values > 1.0 quantify fabric | | | inefficiency: JCT Ratio = JCT_measured / | | | JCT_roofline. The JCT Ratio is the primary | | | comparative metric for AI training fabric | | | benchmarking. | +------------------+----------------------------------------------+ | *Gradient | The AllReduce collective operation performed | | Synchronization* | after the backward pass of each training | | | step to sum the locally computed gradients | | | across all data-parallel replicas. The | | | dominant communication event in data- | | | parallel training, occurring once per | | | training step per layer. | +------------------+----------------------------------------------+ | *Step Time* | The wall-clock duration of a single training | | | iteration (forward pass + backward pass + | | | gradient synchronization + optimizer step). | | | Step time = computation time + communication | | | time, where the communication time is | | | dominated by the AllReduce collective. | Calabria, et al. Expires 30 August 2026 [Page 18] Internet-Draft AI Fabric Benchmarking Terminology February 2026 +------------------+----------------------------------------------+ | *Soak Test* | A sustained-load test run for an extended | | | period (minimum 24 hours for stability | | | evaluation) at a defined offered load | | | fraction (e.g., 70% or 90% of maximum | | | throughput). Soak tests detect buffer | | | leaks, ECMP imbalance drift, PFC storm | | | initiation, and long-tail error accumulation | | | not visible in short-duration tests. | +------------------+----------------------------------------------+ Table 10: Training-Specific Terms 9. Inference-Specific Terms The following terms are specific to AI inference serving workload benchmarking and are used normatively in [INFERENCE-BENCH]. +==================+================================================+ | Term | Definition | +==================+================================================+ | *TTFT* | Time to First Token. The elapsed time | | | from receipt of an inference request by | | | the serving system to emission of the | | | first output token. Encompasses prompt | | | processing (prefill), KV cache generation, | | | optional KV cache transfer (in | | | disaggregated architectures), and the | | | initial decode step. Interactive serving | | | target: TTFT < 500 ms at P99. | +------------------+------------------------------------------------+ | *ITL* | Inter-Token Latency. The elapsed time | | | between successive output tokens during | | | the autoregressive decode phase. Measured | | | at P50, P95, P99, and P99.9 to | | | characterize tail latency behavior. | | | Interactive serving target: ITL < 50 ms at | | | P99. | +------------------+------------------------------------------------+ | *TPS* | Tokens Per Second. Aggregate throughput | | | of the inference serving system, measured | | | as the total number of output tokens | | | generated per second across all concurrent | | | requests. Reported separately for input- | | | side (prefill) TPS and output-side | | | (decode) TPS. | +------------------+------------------------------------------------+ | *KV Cache* | Key-Value Cache. The intermediate | Calabria, et al. Expires 30 August 2026 [Page 19] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | attention state (key and value projection | | | matrices from multi-head attention layers) | | | computed during the prefill phase and | | | reused during each decode step to avoid | | | redundant recomputation. KV cache size | | | scales with: layers × attention_heads × | | | head_dim × sequence_length × precision. | | | The attention head configuration *MUST* be | | | reported in all benchmark results. | +------------------+------------------------------------------------+ | *Prefill Phase* | The compute-bound phase of LLM inference | | | in which the entire input prompt is | | | processed in parallel to generate the KV | | | cache and the first output token. | | | Characterized by high arithmetic intensity | | | (200–400 ops/byte), high accelerator | | | utilization (90–95%), and large activation | | | tensors. Prefill latency dominates TTFT | | | for long prompts. | +------------------+------------------------------------------------+ | *Decode Phase* | The memory-bandwidth-bound phase of LLM | | | inference in which output tokens are | | | generated autoregressively, one token per | | | forward pass, by reading the KV cache. | | | Characterized by low arithmetic intensity | | | (60–80 ops/byte), lower accelerator | | | utilization (20–40%), and memory- | | | bandwidth-limited KV cache reads. Decode | | | throughput limits TPS. | +------------------+------------------------------------------------+ | *Disaggregated | An inference serving architecture in which | | Serving* | the prefill phase and decode phase are | | | executed on physically separate groups of | | | accelerators (workers), connected by a | | | network fabric. Allows independent | | | scaling of prefill and decode resources | | | (xPyD) but introduces KV cache transfer as | | | a fabric-critical data movement. | +------------------+------------------------------------------------+ | *xPyD Ratio* | The allocation ratio of x prefill workers | | | to y decode workers in a disaggregated | | | serving cluster. Example: 3P9D denotes 3 | | | prefill nodes and 9 decode nodes. The | | | optimal xPyD ratio depends on model size, | | | prompt/output length distributions, and | | | TTFT/ITL SLO targets. | +------------------+------------------------------------------------+ | *Continuous | A dynamic inference scheduling technique | Calabria, et al. Expires 30 August 2026 [Page 20] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | Batching* | that inserts new requests into an active | | | decode batch as slots become available | | | (without waiting for the current batch to | | | complete), improving accelerator | | | utilization compared to static batching. | | | Generates variable batch sizes that affect | | | fabric traffic burstiness. | +------------------+------------------------------------------------+ | *PagedAttention* | A KV cache memory management technique | | | storing attention keys and values in | | | fixed-size, non-contiguous virtual pages | | | (typically 16–64 KB), inspired by OS | | | virtual memory management. Reduces memory | | | fragmentation and enables efficient KV | | | cache sharing across requests with common | | | prefixes. | +------------------+------------------------------------------------+ | *Prefix Caching* | Reuse of previously computed KV cache | | | segments for inference requests sharing a | | | common prompt prefix (e.g., a fixed system | | | prompt), eliminating redundant prefill | | | computation. Prefix cache hit rate is a | | | secondary KPI for inference serving | | | efficiency. | +------------------+------------------------------------------------+ | *Normal | An AllToAll MoE dispatch communication | | Dispatch* | mode optimized for the prefill phase. | | | Payload sizes are variable (depending on | | | token-to-expert routing), generating | | | dynamic tensor shapes incompatible with | | | static graph capture. Maximizes | | | throughput for large batches at the cost | | | of higher per-dispatch latency. | +------------------+------------------------------------------------+ | *Low-Latency | An AllToAll MoE dispatch communication | | Dispatch* | mode optimized for the decode phase. | | | Payload sizes are padded to fixed maximum | | | dimensions (compatible with static graph | | | capture), enabling lower kernel-launch | | | overhead at the cost of slight bandwidth | | | inefficiency. Target: < 200 µs per | | | dispatch round trip. | +------------------+------------------------------------------------+ | *SLO* | Service Level Objective. A quantitative | | | target for an inference serving KPI. AI | | | inference SLOs typically specify maximum | | | TTFT (e.g., < 500 ms P99) and maximum ITL | | | (e.g., < 50 ms P99) under a specified | Calabria, et al. Expires 30 August 2026 [Page 21] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | request arrival rate. | +------------------+------------------------------------------------+ | *Speculative | An inference acceleration technique using | | Decoding* | a small draft model to generate candidate | | | token sequences verified in parallel by | | | the target model. Reduces effective ITL | | | but generates bursty, variable-length KV | | | cache traffic; noted as a future | | | benchmarking area not fully specified in | | | the current companion documents. | +------------------+------------------------------------------------+ Table 11: Inference-Specific Terms 9.1. Inference Phase Characteristics +===========+===============+============+=============+=========+ | Phase | Compute Bound | Arithmetic | Accelerator | Primary | | | | Intensity | Util. | KPI | +===========+===============+============+=============+=========+ | *Prefill* | Yes | 200–400 | 90–95% | TTFT | | | | ops/byte | | | +-----------+---------------+------------+-------------+---------+ | *Decode* | No (memory BW | 60–80 ops/ | 20–40% | ITL, | | | bound) | byte | | TPS | +-----------+---------------+------------+-------------+---------+ Table 12: Inference Phase Characteristics 10. KPI Classification Terms The following terms define the three-tier KPI taxonomy used across both companion methodology documents. Calabria, et al. Expires 30 August 2026 [Page 22] Internet-Draft AI Fabric Benchmarking Terminology February 2026 +============+======================================================+ | Term | Definition | +============+======================================================+ | *Primary | A top-level performance indicator directly | | KPI* | representing end-user experience or training | | | efficiency. In training: JCT Ratio and BusBW. In | | | inference: TTFT and ITL. Primary KPIs are the | | | principal reporting metric and the basis for | | | comparative benchmarking across DUT | | | implementations. | +------------+------------------------------------------------------+ | *Secondary | A fabric-level performance indicator providing | | KPI* | mechanistic explanation for primary KPI values. | | | Examples: collective operation throughput (BusBW), | | | KV cache transfer goodput, AllToAll dispatch | | | latency, ECMP imbalance (MMR), and link | | | utilization. Secondary KPIs enable root-cause | | | analysis of Primary KPI deviations. | +------------+------------------------------------------------------+ | *Fabric | An operational metric characterizing fabric | | Health | stability and anomaly conditions rather than peak | | Indicator | performance. FHIs include: PFC event rate, PFC | | (FHI)* | storm occurrence, ECN marking ratio, packet loss | | | rate, buffer occupancy (P99), and retransmission | | | rate. FHIs *SHOULD* be continuously monitored and | | | reported throughout all test categories. | +------------+------------------------------------------------------+ | *Goodput* | The application-useful data delivered per unit | | | time, excluding retransmitted packets, protocol | | | overhead, and padding. Goodput may differ | | | significantly from raw throughput during congestion | | | events; both *SHOULD* be reported in benchmarking | | | results. | +------------+------------------------------------------------------+ | *Zero | A test acceptance criterion requiring that no | | Packet | packets are dropped by the DUT during the | | Loss* | measurement interval. For RoCEv2 and UET | | | transports, zero packet loss is the target | | | operating condition. The binary search procedure | | | in the companion methodology documents determines | | | the maximum offered load satisfying this criterion. | +------------+------------------------------------------------------+ Table 13: KPI Classification Terms Calabria, et al. Expires 30 August 2026 [Page 23] Internet-Draft AI Fabric Benchmarking Terminology February 2026 10.1. KPI Tier Summary +============+=================+==================+=================+ | Tier | Training | Inference | Purpose | | | Examples | Examples | | +============+=================+==================+=================+ | *Primary | JCT Ratio, | TTFT, ITL, TPS | Direct end-user | | KPI* | BusBW | | experience / | | | | | business impact | +------------+-----------------+------------------+-----------------+ | *Secondary | AllReduce | AllToAll | Root cause | | KPI* | BusBW, MMR, | dispatch | analysis of | | | Link | latency, KV | Primary KPI | | | Utilization | transfer goodput | deviations | +------------+-----------------+------------------+-----------------+ | *Fabric | PFC events, | PFC events, ECN | Ongoing fabric | | Health | ECN ratio, | ratio, packet | stability and | | Indicator | packet loss, | loss, buffer P99 | anomaly | | (FHI)* | buffer P99, | | detection | | | retx rate | | | +------------+-----------------+------------------+-----------------+ Table 14: KPI Tier Summary 11. Referenced Standards Abbreviations The following abbreviations refer to normative and informative IETF documents referenced throughout this document and the companion methodology documents. Calabria, et al. Expires 30 August 2026 [Page 24] Internet-Draft AI Fabric Benchmarking Terminology February 2026 +===========+=======================================================+ | Reference | Definition | +===========+=======================================================+ | *RFC | "Benchmarking Terminology for Network | | 1242* | Interconnect Devices" (Bradner, 1991). Defines | | | foundational benchmarking terms (throughput, | | | latency, frame loss rate, back-to-back frames). | | | The baseline terminology reference for BMWG | | | work. Where terms in this document overlap | | | with RFC 1242 definitions, the AI fabric | | | context definitions herein take precedence. | +-----------+-------------------------------------------------------+ | *RFC | "Benchmarking Methodology for Network | | 2544* | Interconnect Devices" (Bradner & McQuaid, | | | 1999). Defines test methodologies for | | | throughput, latency, frame loss rate, and back- | | | to-back measurements. The AI fabric | | | methodology documents extend RFC 2544 | | | procedures for AI-specific traffic patterns and | | | test durations. | +-----------+-------------------------------------------------------+ | *RFC | "Data Center Benchmarking Terminology" (Bitar | | 8238* | et al., 2017). Extends RFC 1242 with data | | | center-relevant terms including forwarding | | | table scaling, congestion, and VM/SDN. Incast, | | | ECN, and buffer occupancy concepts in this | | | document align with RFC 8238 definitions. | +-----------+-------------------------------------------------------+ | *RFC | "Data Center Benchmarking Methodology" (Bitar | | 8239* | et al., 2017). Defines test methodologies for | | | data center network functions including incast, | | | ECN marking, and lossless behavior. The AI | | | fabric companion methodology documents extend | | | RFC 8239 for distributed AI collective traffic | | | patterns. | +-----------+-------------------------------------------------------+ | *RFC 2119 | "Key words for use in RFCs to Indicate | | / RFC | Requirement Levels" (Bradner, 1997; Leiba, | | 8174* | 2017). Define the normative requirement | | | language: MUST, MUST NOT, REQUIRED, SHALL, | | | SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, | | | MAY, and OPTIONAL. RFC 8174 clarifies that | | | these terms are normative only when in | | | uppercase; lowercase uses are not normative. | +-----------+-------------------------------------------------------+ Table 15: Referenced Standards Abbreviations Calabria, et al. Expires 30 August 2026 [Page 25] Internet-Draft AI Fabric Benchmarking Terminology February 2026 Acknowledgments TODO acknowledge. References Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices", RFC 2544, DOI 10.17487/RFC2544, March 1999, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . Informative References [IBTA-ROCE] InfiniBand Trade Association, "InfiniBand Architecture Specification, Annex 16: RoCE", 2010, . [INFERENCE-BENCH] Calabria, F. and C. Pignataro, "Benchmarking Methodology for AI Inference Serving Network Fabrics", Work in Progress, Internet-Draft, draft-bmwg-ai-fabric-inference- bench-00, February 2026, . [RFC1242] Bradner, S., "Benchmarking Terminology for Network Interconnection Devices", RFC 1242, DOI 10.17487/RFC1242, July 1991, . [RFC8238] Avramov, L. and J. Rapp, "Data Center Benchmarking Terminology", RFC 8238, DOI 10.17487/RFC8238, August 2017, . [RFC8239] Avramov, L. and J. Rapp, "Data Center Benchmarking Methodology", RFC 8239, DOI 10.17487/RFC8239, August 2017, . Calabria, et al. Expires 30 August 2026 [Page 26] Internet-Draft AI Fabric Benchmarking Terminology February 2026 [TRAINING-BENCH] Calabria, F. and C. Pignataro, "Benchmarking Methodology for AI Training Network Fabrics", Work in Progress, Internet-Draft, draft-bmwg-ai-fabric-training-bench-00, February 2026, . [UEC-SPEC-1.0] Ultra Ethernet Consortium, "Ultra Ethernet Specification 1.0", 2024, . Appendix A: Term Cross-Reference to Companion Documents The following table identifies which terms from this document are used in each companion methodology document. +====================+====================+=======================+ | Term Category | Used in Training | Used in Inference | | | Bench | Bench | +====================+====================+=======================+ | General | All terms | All terms | | Benchmarking Terms | | | | (§2) | | | +--------------------+--------------------+-----------------------+ | Collective | AllReduce, | AllToAll, BusBW | | Communication (§3) | AllGather, | | | | ReduceScatter, | | | | AllToAll, BusBW, | | | | CCL, Ring | | | | Algorithm, BSP, | | | | SPMD | | +--------------------+--------------------+-----------------------+ | Parallelism | DP, TP, PP, EP, | EP, MoE, DP Attention | | Strategies (§4) | MoE, ZeRO | | +--------------------+--------------------+-----------------------+ | RDMA / RoCEv2 | RDMA, RoCEv2, QP, | RDMA, RoCEv2, QP, RC | | (§5.1) | RC mode, RDMA Verb | mode, GIN, KVCXL | +--------------------+--------------------+-----------------------+ | UET Terms (§5.2) | UET, PDC, ROD, | UET, RUD, GIN | | | RUD, RUDI, UUD, | | | | LLR, Packet | | | | Trimming, CBFC, | | | | UEC Profile, | | | | Entropy Value | | +--------------------+--------------------+-----------------------+ | Congestion Control | PFC, PFC Storm, | PFC, ECN, DCQCN, | | (§6) | PFC Deadlock, ECN, | Incast, Packet Spray, | | | DCQCN, ECN Marking | ECMP | Calabria, et al. Expires 30 August 2026 [Page 27] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | | Ratio, Incast, | | | | Incast Ratio, | | | | Packet Spray, DLB/ | | | | Flowlet, ECMP, MMR | | +--------------------+--------------------+-----------------------+ | Fabric Topology | Clos, Rail- | Clos, Bisection BW, | | (§7) | Optimized, | ToR, NIC, Buffer | | | Bisection BW, | Occupancy, Link | | | Oversubscription, | Utilization | | | ToR, Spine, NIC, | | | | Buffer Occupancy, | | | | Zero-Impact | | | | Failover, Link | | | | Utilization | | +--------------------+--------------------+-----------------------+ | Training-Specific | JCT, Roofline JCT, | Soak Test | | (§8) | JCT Ratio, | | | | Gradient Sync, | | | | Step Time, Soak | | | | Test | | +--------------------+--------------------+-----------------------+ | Inference-Specific | — | TTFT, ITL, TPS, KV | | (§9) | | Cache, Prefill, | | | | Decode, Disaggregated | | | | Serving, xPyD, | | | | Continuous Batching, | | | | PagedAttention, | | | | Prefix Caching, | | | | Normal/Low-Latency | | | | Dispatch, SLO | +--------------------+--------------------+-----------------------+ | KPI Classification | Primary KPI (JCT | Primary KPI (TTFT, | | (§10) | Ratio, BusBW), | ITL), Secondary KPI, | | | Secondary KPI, | FHI, Goodput, Zero | | | FHI, Goodput, Zero | Packet Loss | | | Packet Loss | | +--------------------+--------------------+-----------------------+ Table 16: Term Cross-Reference to Companion Documents Appendix B: Term Taxonomy Summary The following table provides a concise summary of all defined terms organized by category, with the section reference for the full definition. Calabria, et al. Expires 30 August 2026 [Page 28] Internet-Draft AI Fabric Benchmarking Terminology February 2026 +=========+====================================+====================+ | Section | Term(s) | Category | +=========+====================================+====================+ | 2 | DUT, SUT, RT, JFI, Offered | General | | | Load, Trial Duration, Warmup | Benchmarking | | | Period, Binary Search, | | | | Percentile Latency, AI | | | | Fabric | | +---------+------------------------------------+--------------------+ | 3 | Collective Operation, | Collective | | | AllReduce, AllGather, | Communication | | | ReduceScatter, AllToAll, | | | | Ring Algorithm, BusBW, CCL, | | | | SPMD, BSP | | +---------+------------------------------------+--------------------+ | 4 | Data Parallelism, Tensor | Parallelism | | | Parallelism, Pipeline | Strategies | | | Parallelism, Expert | | | | Parallelism, MoE, DP | | | | Attention, ZeRO | | +---------+------------------------------------+--------------------+ | 5.1 | RDMA, RoCEv2, QP, Reliable | Transport — RDMA / | | | Connected (RC), RDMA Verb, | RoCEv2 | | | UET, PDC, ROD | | +---------+------------------------------------+--------------------+ | 5.2 | RUD, RUDI, UUD, UEC Profile, | Transport — UET | | | LLR, Packet Trimming, CBFC, | | | | Entropy Value, GIN, KVCXL | | +---------+------------------------------------+--------------------+ | 6 | PFC, PFC Storm, PFC | Congestion Control | | | Deadlock, ECN, DCQCN, ECN | | | | Marking Ratio, Incast, | | | | Incast Ratio, Packet Spray, | | | | DLB/Flowlet, ECMP, MMR | | +---------+------------------------------------+--------------------+ | 7 | Clos/Fat-Tree, Rail- | Fabric Topology | | | Optimized, Bisection | | | | Bandwidth, Oversubscription | | | | Ratio, ToR Switch, Spine/ | | | | Superspine, NIC, Buffer | | | | Occupancy, Zero-Impact | | | | Failover, Link Utilization | | +---------+------------------------------------+--------------------+ | 8 | JCT, Roofline JCT, JCT | Training-Specific | | | Ratio, Gradient | | | | Synchronization, Step Time, | | | | Soak Test | | +---------+------------------------------------+--------------------+ Calabria, et al. Expires 30 August 2026 [Page 29] Internet-Draft AI Fabric Benchmarking Terminology February 2026 | 9 | TTFT, ITL, TPS, KV Cache, | Inference-Specific | | | Prefill Phase, Decode Phase, | | | | Disaggregated Serving, xPyD | | | | Ratio, Continuous Batching, | | | | PagedAttention, Prefix | | | | Caching, Normal Dispatch, | | | | Low-Latency Dispatch, SLO, | | | | Speculative Decoding | | +---------+------------------------------------+--------------------+ | 10 | Primary KPI, Secondary KPI, | KPI Classification | | | Fabric Health Indicator, | | | | Goodput, Zero Packet Loss | | +---------+------------------------------------+--------------------+ | 11 | RFC 1242, RFC 2544, RFC | Referenced | | | 8238, RFC 8239, RFC | Standards | | | 2119/8174 | | +---------+------------------------------------+--------------------+ Table 17: Complete Term Taxonomy Authors' Addresses Fernando Calabria Cisco Email: fcalabri@cisco.com Carlos Pignataro Blue Fern Consulting Email: carlos@bluefern.consulting Qin Wu Huawei Email: bill.wu@huawei.com Giuseppe Fioccola Huawei Email: giuseppe.fioccola@huawei.com Calabria, et al. Expires 30 August 2026 [Page 30]