Convergence of Ultra Ethernet, CXL, and PCIe Toward a Rack-Scale Computer Fabric

Over the next several years, the boundaries between device interconnects and datacenter networks are likely to blur. Technologies such as PCI Express, Compute Express Link, and the emerging Ultra Ethernet architecture are evolving toward similar goals: moving large amounts of data between compute elements with minimal software overhead and with latency characteristics closer to memory access than to traditional networking. As AI infrastructure grows in scale, these technologies are increasingly being positioned not as separate layers but as components of a unified communication fabric that spans an entire rack.

The shift from servers to rack-scale systems

Traditional datacenter architecture treats each server as a self-contained system connected to others through a network. Inside the server, PCIe connects CPUs, GPUs, storage, and accelerators. Communication between servers occurs through Ethernet or other networking fabrics. This separation worked well when most applications were loosely distributed.

Large AI workloads, however, behave differently. Training large models involves thousands of GPUs exchanging gradients and tensors continuously. The boundaries between servers begin to matter less than the total pool of compute, memory, and accelerator resources available across the cluster. As a result, system designers increasingly treat a rack—or even a row of racks—as a single logical compute unit. The interconnect technologies that link components inside a server therefore begin to resemble those used between servers.

Complementary roles of the three fabrics

Although PCI Express, Compute Express Link, and Ultra Ethernet originate in different domains, their capabilities are increasingly complementary. PCIe provides the fundamental physical and protocol layer that connects processors to devices with very high bandwidth and low latency. CXL extends PCIe by introducing memory coherency and shared memory semantics between CPUs and accelerators, allowing devices to access memory regions more flexibly and efficiently. Ultra Ethernet aims to bring similar low-latency, accelerator-optimized communication patterns across an entire datacenter network.

In a future rack-scale architecture, these technologies may form a layered continuum rather than distinct systems. PCIe would continue to serve as the immediate device interconnect within a node. CXL would enable memory sharing and pooling across multiple processors and accelerators within the rack. Ultra Ethernet would extend this communication across nodes while preserving many of the same semantics—direct memory placement, low latency messaging, and hardware-managed communication.

Convergence around memory semantics

One of the strongest forces driving this convergence is the shift from message-oriented communication toward memory-oriented communication. Traditional networking protocols exchange packets containing application messages. Modern accelerator workloads instead exchange large memory buffers. Technologies such as RDMA already allow networks to write directly into remote memory, while CXL introduces mechanisms for devices to access shared memory pools.

Ultra Ethernet builds on this trend by optimizing Ethernet for large-scale accelerator communication. Instead of treating the network purely as a packet transport, it begins to resemble a distributed memory interconnect in which devices communicate through memory-like operations. This makes the conceptual boundary between CXL memory transactions and network transfers increasingly narrow.

Latency and determinism requirements

Another factor driving convergence is the demand for extremely low and predictable latency in AI training systems. Collective operations such as all-reduce require thousands of GPUs to exchange data simultaneously and synchronize computation phases. Even small variations in latency can reduce overall training efficiency.

PCIe already provides deterministic communication within a server. CXL aims to extend deterministic access to shared memory pools. Ultra Ethernet introduces mechanisms such as credit-based flow control, link-level retries, and hardware message matching to achieve more predictable behavior in large clusters. As these features evolve, the communication characteristics of datacenter networks begin to resemble those of device interconnect fabrics.

Hardware offload and accelerator-centric networking

The role of network interfaces is also changing. In traditional networking, the NIC primarily moves packets between the network and the host stack. In AI systems, NICs increasingly perform sophisticated functions such as RDMA operations, collective communication acceleration, congestion management, and memory placement. These capabilities bring NIC architectures closer to those of accelerator interconnect controllers.

As a result, the boundaries between PCIe controllers, CXL interfaces, and high-performance NICs are beginning to blur. Future silicon designs may integrate features traditionally associated with all three technologies into unified I/O subsystems capable of handling device communication, memory sharing, and cluster networking through a common architecture.

Resource disaggregation and pooling

Datacenter operators are also pushing toward resource disaggregation, where compute, memory, storage, and accelerator resources can be dynamically composed into logical systems. CXL is expected to enable large memory pools that multiple processors can access. Ultra Ethernet provides the bandwidth and scalability required to connect large numbers of accelerators across nodes. PCIe remains the fundamental attachment point for these devices within individual systems.

Together, these technologies enable a model in which resources within a rack are no longer fixed to a single server but can be allocated dynamically across workloads. A rack may contain pools of CPUs, GPUs, and memory modules that behave collectively as a single composable computer.

Toward the rack as the system

In this emerging architecture, the rack effectively becomes the new unit of computing. Individual servers function more like modular building blocks rather than fully independent machines. Communication between GPUs in different servers begins to resemble communication between devices on the same motherboard, only across a larger physical distance.

The convergence of PCIe, CXL, and Ultra Ethernet reflects this architectural shift. Each technology contributes elements required to make a rack-scale computer possible: high-speed device connectivity, shared memory semantics, and scalable inter-node communication. As these capabilities continue to align, the distinction between internal system interconnects and datacenter networks may gradually disappear, replaced by a unified fabric spanning the entire rack.

 

Leave A Comment