GPU has An Energy Problem

While the massive growth of NVIDIA, fueled by the burgeoning number of AI companies’ demand for GPUs occurs on one end, the costs incurred by the purchasing company doesn’t end with a mere GPU.

The amount of money spent on energy costs for running these GPUs in data centres are enormous. Recently, a study showed that data centres approximately consume about 1,000 kWh per square metre, which is about 10x the power consumption of a typical American home. BLOOM, an LLM, utilised 914kWh over an 18-day period while running on 16 NVIDIA A100 40GB GPUs, managing an average of 558 requests/hour.

Climbing Costs

As per an article by Sequoia, if $1 is spent on a GPU, approximately another dollar is spent on energy costs for running that GPU in a data centre, and taking into consideration how the companies buying them will need to make a margin, the costs incurred would almost be two-fold.

Training AI models within data centres can require up to three times the energy compared to typical cloud workloads, thereby straining the infrastructure needs. For instance, AI servers with GPUs may require up to 2 kW of power, whereas a standard cloud server will require only 300-500W.

Last year, in Northern Virginia’s Data Center Alley, there was almost a power outage owing to large consumption. It is also believed that the current generation of data centres are ill-equipped to handle surge demand owing to AI-related activities. Furthermore, power usage is expected to surpass 35GW per year by 2030.

Source: McKinsey

As per research, AI data centre server infrastructure along with operating costs is said to cross $76 billion by 2028. This exceeds twice the estimated annual operating cost of AWS which holds about one-third the cloud infrastructure services market.

Big tech companies are also shelling big on running it. Earlier this year, The Information had estimated that OpenAI spends close to $700,000 daily for running its models. Owing to massive amounts of computing power required, the infrastructural cost for running the AI model is not easy.

Considering the trend in which companies are speeding through the generative AI race, a Gartner study projects that in the next two years, the exorbitant costs will exceed the value generated, which would lead to about 50% of large enterprises pulling the plug on its large-scale AI model developments by 2028.

A user on X who reviews CPU coolers, spoke about how he would choose an energy efficient GPU not to avoid high electricity bills but because of its heat generation.

I'm probably going to end up getting the RTX 4070. I would consider a 6950XT or a used RTX 3090 but I really need an *energy efficient* GPU.
Not because I care about the electricity cost, it's cheap where I live, but because my AC is already struggling to keep it 23C inside. pic.twitter.com/ZTOeVOIFYV

— Albert Thomas – Cooling Reviewer (@ultrawide219) July 17, 2023

Workaround For Energy Efficiency

Specialised data centres that are aimed at running generative AI workloads are springing. Sprouting in suburban locations that are away from big markets and running on existing electrical networks without shorting them are options companies are looking at. With quicker connectivity and reduced expenses, these are emerging as viable alternatives.

Innovative technology to build cooler data centres are also pursued. As part of a government program COOLERCHIPS, the US Department of Energy, recently awarded $40 million to fund 15 projects. NVIDIA has been granted $5 million to build a data centre with revolutionary cooling systems to boost energy efficiency. The team is building an innovative liquid-cooling system that can efficiently cool a data centre in a mobile container, even when it operates at temperatures as high as 40 degree celsius drawing 200kW of power. The new system is said to run 20% more efficiently than current air-cooled approaches and will cost at least 5% less.

In the near-future, a probable possibility of renewable energy sources that can power data centres is also likely. Going by how big tech leaders are increasingly investing in nuclear energy companies, and with Microsoft posting a job opening for a ‘principal program manager for nuclear technology,’ nothing can be ruled out. It might pave the way for energy and cost-effective alternatives that might address the current problem.

With the current pattern of energy consumption slated to only go up, the rising demand of GPUs can also create another scenario. With increased adoption, the cost of GPUs can eventually come down. Thereby, a trade-off with energy consumption can be achieved.

The post GPU has An Energy Problem appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...