Ultra Ethernet and Its Architectural Parallels with PCIe

Ultra Ethernet represents an effort to evolve Ethernet into a fabric capable of supporting the communication patterns of large AI and high-performance computing clusters. While it remains fundamentally an Ethernet-based networking technology, several of its architectural choices resemble mechanisms long used in device interconnects such as PCI Express. The resemblance is not accidental. Modern AI clusters increasingly behave less like conventional distributed systems and more like large, loosely coupled compute machines where accelerators exchange memory buffers continuously. In this context, the network begins to resemble a distributed extension of the intra-server interconnect fabric, and many of the mechanisms emerging in Ultra Ethernet reflect ideas already proven within PCIe systems.

Memory-oriented communication

One of the strongest parallels appears in the move toward memory-centric communication. PCIe is fundamentally built around memory transactions: devices issue direct memory access operations, performing reads or writes directly into system memory with minimal CPU involvement. Ultra Ethernet adopts a similar philosophy through RDMA-style transport mechanisms that allow incoming data to be placed directly into host or accelerator memory without passing through the operating system networking stack.

For AI workloads this is critical. GPU clusters exchange tensors, gradients, and model parameters as large memory buffers rather than conventional application messages. Direct data placement allows the network interface to write received data straight into GPU memory, closely mirroring the DMA behavior used by PCIe devices. In practice, a GPU typically sends data across PCIe to the NIC, which then transmits it across the Ultra Ethernet fabric to another NIC. That NIC finally places the data into the destination GPU’s memory through PCIe again. Architecturally, the network begins to look like a remote extension of the same memory-centric communication model that exists within a server.

Transaction-style communication

Another similarity lies in the way communication state is handled. PCIe is fundamentally transaction-based. Devices issue requests such as memory reads or writes and receive completion packets in response, without maintaining long-lived connection state. Ultra Ethernet moves closer to this philosophy by reducing dependence on persistent connection state and enabling lightweight or ephemeral communication contexts.

This design becomes especially important at the scale of modern AI clusters where tens of thousands of accelerators may communicate simultaneously. Maintaining traditional connection-oriented state for every endpoint would introduce large memory and processing overheads. By moving toward a transaction-style model, Ultra Ethernet allows communication to begin quickly, in a manner reminiscent of PCIe request–completion exchanges.

Flow control and link reliability

The parallels continue at the link layer. PCIe uses credit-based flow control to ensure that transmitters only send data when the receiver has sufficient buffer capacity available. This mechanism prevents buffer overflow and allows deterministic traffic management at very high data rates.

Ultra Ethernet introduces similar credit-based flow control concepts intended to provide stronger determinism than conventional Ethernet congestion mechanisms. Traditional Ethernet networks often rely on reactive approaches such as pause frames or priority flow control. Ultra Ethernet’s credit-based approach manages transmission more proactively, closer in spirit to PCIe’s buffer-aware design.

Reliability mechanisms also follow a comparable pattern. PCIe implements acknowledgment and negative acknowledgment mechanisms at the data link layer so that only corrupted or lost packets need to be retransmitted. Ultra Ethernet incorporates link-level retry capabilities with a similar goal. Lost or corrupted frames can be retransmitted locally between neighboring devices rather than being resent across the entire protocol stack. This reduces latency variability, which is particularly important for distributed AI workloads where thousands of GPUs must exchange data in synchronized phases.

Message matching and hardware acceleration

PCIe transactions include identifiers that allow completion packets to be matched with their original requests. Ultra Ethernet incorporates a related concept through hardware-level message tagging and message matching. These mechanisms are widely used in HPC and distributed training frameworks, especially those based on MPI communication models.

The network interface hardware can match incoming messages to the correct destination buffers using these tags, sometimes even supporting wildcard tag matching used in certain collective communication patterns. Performing this work in hardware rather than software reduces CPU overhead and shortens the latency of message handling.

Multipath fabrics and traffic distribution

Large PCIe systems often rely on switching fabrics that distribute transactions across multiple paths to avoid bottlenecks. Ultra Ethernet introduces comparable ideas in the form of packet spraying and advanced multipath load balancing. Traffic can be distributed across multiple routes within the cluster fabric, improving utilization of available bandwidth while reducing the likelihood of congestion hotspots during large distributed training runs.

Process and job context identifiers

Another architectural similarity appears in how communication flows are associated with specific processes or workloads. PCIe and related interconnect technologies such as Compute Express Link introduce identifiers like PASID that associate device transactions with particular process address spaces.

Ultra Ethernet adopts comparable contextual identifiers that allow packets to be associated with particular jobs or processes within a shared AI cluster. In large multi-tenant environments where several workloads may run simultaneously, these identifiers allow network hardware to manage communication flows more efficiently and isolate traffic belonging to different applications.

Toward a distributed PCIe-like fabric

Taken together, these design elements reveal a broader architectural direction. PCIe traditionally acts as the scale-up interconnect inside a server, linking CPUs, GPUs, storage devices, and accelerators through a high-speed memory-centric fabric. Ultra Ethernet is emerging as a scale-out counterpart that connects entire servers and accelerator clusters using similar communication principles.

Rather than treating networking purely as packet exchange between independent machines, Ultra Ethernet moves toward a model in which accelerators communicate through memory-like operations across a datacenter fabric. In this sense, it can be viewed as an effort to make Ethernet behave more like PCIe, but at the scale of racks and datacenters rather than individual systems. The network becomes less a traditional packet network and more an extension of the accelerator interconnect fabric, enabling GPUs and other devices to exchange memory buffers efficiently across large AI clusters.

Back to Blog