What Keeps NVIDIA Blackwell Cool?

While NVIDIA gears up to provide its Blackwell platform as high-end GPUs, major cloud service providers (CSPs) are expected to adopt liquid cooling solutions to meet the demands of these powerful AI servers.

At Tech World 2024, Lenovo launched the ThinkSystem N1380 Neptune, a water-cooling system built to support NVIDIA’s Blackwell platform and AI applications.

NVIDIA chief Jensen Huang said that Lenovo’s years of hard work on building supercomputers had finally paid off. Marvelling at the ThinkSystem N1380 Neptune, Huang said, “Tell me this isn’t beautiful. To an engineer, this is sexy!”

The new Neptune chassis enables organisations to operate 100KW+ server racks without the need for specialised air conditioning, achieving 100% heat removal.

“We are reshaping water cooling and pushing the industry boundaries further, and we’re doing it with innovation and engineering, with the highest accelerated computing capacity available in a small form factor chassis that organisations of any size can use,” said Vlad Rozanovich, senior vice president, Infrastructure Solutions Group, Lenovo.

He explained that Lenovo’s water cooling system is different from those currently available in the market. “For instance, instead of chilling the water before it goes into one of these racks, you can actually use standard ambient water temperatures,” he said.

“We’ve seen people use the warm water coming out of the server to create heated environments for cold-temperature office buildings, schools, or university sites. So you’re actually seeing the reconciliation or reuse of that water,” he added.

Regarding the N1380, he said the infrastructure fits within a space smaller than one square metre, which aligns with standard 19-inch rack sizes commonly used in data centres. “We’ve reimagined the engineering of what rack integration looks like for vertical computing nodes optimised for spaces where total heat removal is most critical to these deployments,” Rozanovich added.

“We’re pushing the boundaries of compact general-purpose computing. By using Neptune, you can direct 30% more of your cooling energy to compute. So, now you have a situation where electricity is being used on what you need it most for, which is compute,” he added.

“The way Lenovo designed these systems is very interesting. They feature a custom and efficient liquid cooling system for the CPU and memory, as newer High Bandwidth Memory (HMB) runs hotter. Liquid cooling is the only way forward for AI servers with power ratings over 1000 watts.” said a user on X.

In India, Tata Communications is deploying NVIDIA Hopper GPUs extensively to strengthen its public cloud infrastructure and support various AI applications. Next year, it plans to add NVIDIA Blackwell GPUs to its offerings.

Lenovo is Not Alone

Moreover, NVIDIA is working with providers to procure liquid cooling solutions. CoolIT Systems, an advanced liquid-cooling technology company for AI and high-performance computing, recently announced its support for the NVIDIA Blackwell AI platform rollout.

The company recently introduced a new line of liquid-cooling products and significantly expanded its manufacturing capacity to meet the growing demand.

“Working together with NVIDIA, hyperscalers and leading server manufacturers, CoolIT introduced several new liquid-cooling products engineered to effectively and efficiently manage heat in NVIDIA Blackwell systems,” said Patrick McGinn, CoolIT’s COO.

On the other hand, Super Micro Computer is speeding up the transition to liquid-cooled data centres. The company is prepared for production with its newly launched X14 and H14 liquid-cooled systems, along with 10U air-cooled systems that support the NVIDIA HGX B200 8-GPU platform.

This comprehensive system includes cooling distribution units (CDUs), cold plates, cooling distribution manifolds (CDMs), cooling towers, and management software. Capable of supporting up to 96 B200 GPUs per rack, this solution has recently been deployed for over 100,000 GPUs.

Much like Lenovo, Supermicro’s technology enables warm water cooling at temperatures up to 113°F (45°C), improving cooling efficiency and allowing the heat to be repurposed for district heating or greenhouse warming.

Earlier this year, NVIDIA partnered with Schneider Electric to optimise data centre infrastructure for AI applications, prioritising liquid-cooling solutions for its high-performance AI clusters.

In August, NVIDIA joined hands with Singapore-based Sustainable Metal Cloud, which runs ‘sustainable AI factories’ through its HyperCubes in Singapore and Australia. These HyperCubes are equipped with NVIDIA processors submerged in polyalphaolefin, a synthetic oil that offers superior heat dissipation compared to air.

The platform reportedly reduces energy use by as much as 50% when compared to conventional air-cooling methods in data centers.

Besides, there are several other companies, like Amphenol, Asia Vital Components, Cooler Master, Colder Products Company, Danfoss, Delta Electronics and LITEON, which provide cooling solutions to NVIDIA.

Meanwhile, Elon Musk is utilising fans to cool the data centers. Recently, Tesla chief shared a picture of a giant GPU cooler in Texas with the caption, “We are nothing without our fans.” He revealed that the facility will use around 130 MW of power and cooling this year, with plans to increase that to over 500 MW in the next 18 months.

The post What Keeps NVIDIA Blackwell Cool? appeared first on AIM.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...