Nvidia Cranks Its BlueField-4 DPU To 800 Gbps

Nvidia Cranks Its BlueField-4 DPU To 800 Gbps October 29, 2025 by Alex Woodie

At its GTC Washington event Tuesday, Nvidia took the wraps off Bluefield-4, the newest generation of its data processing unit (DPU). The new generation moves data from storage to compute at rates of 800 Gbps, double the previous generation, with a 6x boost in computational power. Nvidia says customers need the extra juice to support the new giga-scale AI factories that are coming online.

DPUs are an increasingly important component of the emerging AI infrastructure stack. Instead of asking CPUs to handle the relatively hum-drum task of moving data over the network, a DPU paired with a network interface card (NIC) serves as a dedicated data mover, thereby unburdening the CPU from data movement overhead and allowing the cluster to get more useful work done.

Bluefield-4-DPU (Image courtesy Nvidia)

The previous generation Bluefield-3 DPU, which Nvidia launched back in 2021, supported 400 Gbps Ethernet and InfiniBand networks. It offered twice the network bandwidth, 4x the compute power, and almost 5x the memory bandwidth compared to the BlueField-2, which launched just a year earlier.

With BlueField-4, Nvidia is upping the ante once again. The new DPU, which pairs an Nvidia Grace CPU and its ConnectX-9 NIC, delivers twice the network bandwidth and 6x the compute power of the BlueField-3, Nvidia says.

Nvidia has packed more oomph into the ConnectX-9 NIC, which is designed to handle the fastest 800 Gbps, RDMA-supported networks being deployed in AI gigafactories. BlueField-4 sports a 16 lane PCIe Gen 6 host interface, compared to the PCIe Gen 5 interface on the BlueField-3’s NIC.

The DPU’s core processor has also seen an upgrade with the addition of a 64-core Arm Neoverse V2 processor, which sports 64 billion transistors, compared to 22 billion transistors with the 16 Arm Cortex-A78 cores used in the BlueField-3. Nvidia has also bolstered the Arm Neoverse V2 processor with 128GB of LPDDR5 memory (compared to 32GB DDR5 memory with BlueField-3) and 114MB of shared L3 cache (compared to 8MB of L2 cache previously), and paired it with a 512GB on-board SSD (compared to a 128GB SSD with v3). All that adds up to the capability to move more data more quickly than before.

Like its predecessor, BlueField-4 supports Nvidia’s DOCA, its Data-center-infrastructure-On-a-Chip Application framework that allows developers to build containerized services around the DPU with that are secure and scalable. DOCA is the vehicle that Nvidia is aiming to provide a “multiservice architecture,” complete with “native service function chaining” around storage, security, and networking activities running on an array of Nvidia’s products, including RTX PRO Servers, HGX, DGX, GB200, and GB300.

A DGX SuperPOD (Source: Nvidia)

When it ships in 2026, BlueField-4 will be available both as a PCIe card or integrated in the Vera Rubin NVL144 rack-scale system. The Vera Rubin NVL144 will deliver 8 exaflops of performance in a single liquid-cooled rack when it ships next year.

Nvidia launched BlueField-4 with support from its top HPC server and storage vendors, including Cisco, DDN, Dell Technologies, HPE, IBM, Lenovo, Supermicro, VAST Data, and WEKA.

Here are some of the statements made by the storage vendors in support of Bluefield-4:

“Nvidia BlueField-4 reimagines what’s possible in the AI factory–where data, compute, and intelligence merge into a single, self-optimizing system,” stated Alex Bouzari, CEO and co-founder of DDN. “At DDN, we see this as the next leap forward in data evolution. By combining Nvidia acceleration fabric with our data intelligence platform, we’re helping customers build the foundation for truly autonomous, high-performance AI at global scale.”

“Storage is now a subcomponent of a larger intelligent system that operates more efficiently and securely when powered by Nvidia BlueField technology,” wrote Howard Marks, VAST Data’s Technologist Extraordinary and Plenipotentiary. “This advancement opens new possibilities for data-plane optimization, richer in-band analytics, and real-time telemetry inside the DBox. The additional compute headroom enables smarter enclosure management, predictive maintenance, and dynamic workload placement based on live telemetry.”

“At WEKA, we view BlueField-4 as a pivotal step toward the future of software-defined, service-oriented AI storage,” writes Jim Sherhart, WEKA’s VP of product marketing. “We are collaborating with Nvidia and designing the next-generation of NeuralMesh to take advantage of BlueField-4’s architecture — bringing data, compute, and networking together in a simpler, more efficient way than traditional server-plus-storage designs allow.”

“BlueField-4 offloads critical network and storage operations from the host CPU, delivering line-rate performance for tasks like data encryption, traffic management, and network segmentation,” writes Robert McNeal, senior consultant in product marketing for Dell. “By handling these functions independently, BlueField-4 enables Dell systems to maximize computational throughput for AI applications—ensuring your GPUs and CPUs stay focused on what matters most: accelerating insight and innovation.”

This article first appeared on HPCwire.

Related

About the author: Alex Woodie

Alex Woodie has written about IT as a technology journalist for more than a decade. He brings extensive experience from the IBM midrange marketplace, including topics such as servers, ERP applications, programming, databases, security, high availability, storage, business intelligence, cloud, and mobile enablement. He resides in the San Diego area.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...