NVIDIA has introduced Rubin CPX, a new class of GPU designed to process massive AI workloads such as million-token coding and long-form video applications. The launch took place at the AI Infra Summit in Santa Clara, where the company also shared new benchmark results from its Blackwell Ultra architecture.
The system is scheduled for availability at the end of 2026. According to NVIDIA, every $100 million invested in Rubin CPX infrastructure could generate up to $5 billion in token revenue.
“Just as RTX revolutionised graphics and physical AI, Rubin CPX is the first CUDA GPU purpose-built for massive-context AI, where models reason across millions of tokens of knowledge at once,” NVIDIA chief Jensen Huang said.
What Rubin CPX Offers
Rubin CPX accelerates attention mechanisms three times faster than earlier NVIDIA GB300 NVL72 systems, enabling longer context sequences without slowing output. The processor integrates into the Vera Rubin NVL144 CPX system, delivering eight exaflops of AI performance, 100 terabytes of memory and 1.7 petabytes per second of memory bandwidth.

AI firms Cursor, Magic and Runway are already working with the platform to expand advanced coding and generative video tools. Customers can also integrate Rubin CPX into existing NVIDIA data centre infrastructure.
Cursor plans to use it to improve developer productivity, while Runway aims to support advanced generative video workflows. “This means creators, from independent artists to major studios, can gain unprecedented speed, realism and control in their work,” Cristóbal Valenzuela, CEO of Runway, said.
Magic is applying the GPU to software agents capable of reasoning across 100-million-token contexts without additional fine-tuning.
Expanding AI with New Blueprint
The GPU addresses use cases like analysing codebases with over 1,00,000 lines or processing more than an hour of high-definition video. At the same event, NVIDIA introduced its AI Factory reference designs, a framework for building giga-scale AI data centres.

The initiative integrates compute, cooling, power and simulation into a unified system, moving beyond traditional data centre design. Partners such as Siemens Energy, Schneider Electric, GE Vernova and Jacobs are collaborating on the project. Ian Buck, NVIDIA’s vice president of the data centre business unit, said the goal was to “optimise every watt of energy so that it contributes directly to intelligence generation”.
What is the Benchmark?
Alongside the new hardware, NVIDIA published MLPerf Inference v5.1 results showing record performance for its Blackwell Ultra GPUs. The platform set new per-GPU benchmarks on models such as DeepSeek-R1, Llama 3.1 405B and Whisper.

In particular, the DeepSeek-R1 model achieved more than 5,800 tokens per second per GPU in offline testing, a 4.7x gain over prior Hopper-based systems.
The results highlight how techniques such as NVFP4 quantisation, FP8 key-value caching and disaggregated serving contributed to performance gains, the company claimed. NVIDIA said these improvements translate into higher throughput for AI factories and lower cost per token processed.