Meet 2025 AIwire Person to Watch Ian Buck

Meet 2025 AIwire Person to Watch Ian Buck August 29, 2025 by Jaime Hampton

GPUs began as graphics accelerators, built to rasterize triangles and shade pixels. Then they trained deep nets. Now they run the large models behind climate research, protein design, materials discovery, and automated lab work. What happens when these engines start to plan, reason, and act in the real world? Few people have had a closer view of that arc than Ian Buck, Nvidia’s vice president of hyperscale and HPC and the architect of CUDA.

With CUDA, Nvidia turned the GPU into a software platform, enabling parallel programming that supports most deep learning and HPC workloads. Under Buck’s leadership, Nvidia’s focus has widened from chips to full systems that treat the datacenter as an AI factory. The Blackwell platform, rack-scale NVLink designs, liquid cooling, and accelerated networking are all parts of the same push: make AI training and inference faster, cheaper, and easier to operate at scale.

The impact reaches into science, healthcare, and industry. Buck points to agentic and physical AI for healthcare, logistics, and industrial workflows, and to science domains where speed and fidelity matter: digital biology pipelines, Earth-2 climate simulation, and robotics tools for autonomous experimentation. We spoke with Buck about the milestones that changed his view of what GPUs can do, the shift to system-level design, and the hard problems that come next. Here is what he had to say:

First, congratulations on your selection as a 2025 AIwire Person to Watch. As the architect of CUDA and now head of NVIDIA’s accelerated computing business unit, you have seen GPUs move from graphics to deep learning to generative AI. Which milestones most reshaped your own view of what GPUs can do for AI, and where do you expect the next big shift to come from?

NVIDIA anticipates the next major shift to come from physical and agentic AI—moving beyond generative AI to a world where AI not only understands and creates but also reasons, plans, and acts in the physical world.

We’re developing agentic AI systems that can operate in real-time by integrating data, tools and feedback to iteratively refine their actions. These systems will soon power autonomous workflows in healthcare, logistics and industrial automation, with use cases like clinical decision support and dynamic fleet routing.

Physical AI embeds this intelligence into robots, autonomous vehicles, and smart machines that interact with real-world physics through three computers—trained through NVIDIA’s full-stack, simulated through NVIDIA Omniverse for digital twin, and deployed with NVIDIA edge computing.

NVIDIA's strategy has shifted from focusing solely on GPUs to delivering fully integrated systems like the Blackwell platform. How do you see this evolution impacting the future of AI infrastructure?

The industry now realizes that the data center is the new unit of computing.

Modern AI has scaled beyond a single GPU or single server. It takes tens of thousands of GPUs to process trillions of tokens to pre-train a major foundation model. To make that many GPUs work in concert, every part of the data center needs to be optimized for AI.

Today we treat the entire data center as a unified computing platform—optimizing across the CPU, GPU, DPU, interconnects, application software, system software, and networking.

What was bespoke to the supercomputing market is becoming mainstream with liquid cooling, rack-scale NVLink designs and accelerated intelligent networking. Every part of the data center is the new canvas of innovation to move the ball forward, and it’s getting faster and faster. We are now executing to an annual silicon rhythm with new chips, rack architectures, and standards to foster the ecosystem on our platforms.

With data centers transforming into AI factories, what are the key challenges and opportunities in scaling AI workloads efficiently and sustainably?

There is an end-to-end opportunity that includes everything from model architectures, training and RL algorithms, numerical methods and precisions kernel libraries, software hardware codesign, node to rack to row to data center design. Every part of the AI factory is on the table for reinvention, which is why AI is outpacing all the legacy silicon scaling rules of the past.

The best way to improve efficiency and sustainability in our AI factories is through performance gains. If we can double or triple a gigawatt data center’s compute capability through new technology innovations, workloads become cheaper and more efficient.

Improving the efficiency and performance of existing AI workloads is a goal with every platform generation; it also makes possible the next wave of AI innovations. With 10x more compute efficiency, we make today’s workloads 10x cheaper and efficient while at the same time providing 10x the compute capability to the innovators who are building the next generation of AI.

What are some of the hardest technical or infrastructure problems you’re focused on solving today as AI continues to scale?

NVIDIA is addressing a wide range of challenges, including:

Inference is Wickedly Hard

Every modern AI model is a reasoning engine—and reasoning is fundamentally an inference problem. The more we can scale GPU performance across racks and datacenters to sustain that reasoning in real time, the more valuable and capable these models become. But combining GPUs at scale isn’t trivial—it requires high-bandwidth interconnects, low-latency synchronization, and system-level software that treats the entire cluster like a single coherent accelerator.

Data Centers Need Rack-scale Systems

With Blackwell’s NVL72, we’ve engineered a rack-scale GPU system that delivers unprecedented compute density and efficiency, effectively making the rack the new unit of performance. Looking ahead, technologies like Kyber and next-generation NVLink will extend this paradigm even further—scaling from racks to entire data centers as a single, unified accelerator for AI.

Moving Data As Fast As AI Learns

Training trillion-parameter models across hundreds of thousands of GPUs pushes the limits of system architecture—compute alone isn’t enough; networking becomes the bottleneck. That’s why we engineered Spectrum-X, Quantum, and NVLink to deliver ultra-low latency, high-bandwidth interconnects purpose-built for multi-node, multi-rack AI systems. And with co-packaged silicon photonics—like Quantum-X Photonics and Spectrum-X Photonics—we’re taking the next step: closing the distance between compute and fabric to move data at the speed AI demands.

What do you see as the top three ways AI will shape the future of scientific research?

Accelerating discovery in digital biology and healthcare:

We recognized the potential of GPUs for digital biology and healthcare two decades ago. In the early 2000s, I worked with Pat Hanrahan and others at Stanford on one of the first attempts to accelerate molecular dynamics simulations in GROMACS using GPUs. Today, NVIDIA’s full-stack AI platform accelerates discovery in every stage of the drug discovery from target identification to molecule generation and optimization.

Transforming climate science:

NVIDIA’s heritage is in developing real-time computer graphics to simulate virtual environments. We invented many physics and lighting techniques to make the experience even more immersive. That decades-long body of work in real-time graphics helped prepare us for the ultimate simulation: Earth-2.

Earth-2 transforms climate science by enabling ultra-high-resolution, AI-powered digital twins of the planet that simulate and visualize weather and climate phenomena at a global scale with unprecedented speed, accuracy, and efficiency. It’s capable of delivering forecasts and actionable insights thousands of times faster and more energy-efficiently than traditional methods.

Enabling “application science” through robotics:

NVIDIA innovations are set to accelerate and transform scientific research by providing advanced simulation, generative AI, and robot learning frameworks that enable autonomous experimentation, rapid hypothesis testing, and large-scale data generation. Tools like NVIDIA Cosmos, Isaac Lab, Isaac Sim and the Isaac GR00T N1 foundation model allow researchers to train robots in high-fidelity simulated environments, generate massive synthetic datasets and bridge the gap between simulation and real-world deployment. This makes it possible to automate complex scientific workflows, improve reproducibility, and unlock discoveries in fields ranging from materials science to medicine.

You can read the rest of our 2025 AIwire People to Watch interviews here.

NVIDIA's strategy has shifted from focusing solely on GPUs to delivering fully integrated systems like the Blackwell platform. How do you see this evolution impacting the future of AI infrastructure?

With data centers transforming into AI factories, what are the key challenges and opportunities in scaling AI workloads efficiently and sustainably?

What are some of the hardest technical or infrastructure problems you’re focused on solving today as AI continues to scale?

What do you see as the top three ways AI will shape the future of scientific research?

Related