Stability AI Claims Intel Gaudi 2 is Faster than NVIDIA H100

Intel Soon to be on Par with NVIDIA

In a recent blog post titled “Behind the Compute,” Stability AI, unveiled shocking findings regarding the performance of Intel Gaudi 2 accelerators compared to NVIDIA’s H100 in training and inference of its upcoming image generation model Stable Diffusion 3.

Stability AI’s text-to-image model demonstrated promising results in the performance analysis. Utilising the 2B parameter multimodal diffusion transformer (MMDiT) version of the model, Stability AI compared the training speed of Intel Gaudi 2 accelerators with NVIDIA’s A100 and H100.

On 2 nodes configuration, Intel Gaudi 2 system processed 927 training images per second, 1.5 times faster than NVIDIA H100-80GB. Further increasing the batch size to 32 per accelerator in Gaudi 2 resulted in a training rate of 1,254 images/sec.

On 32 Nodes Configuration, the Gaudi 2 cluster processed over 3x more images per second compared to NVIDIA A100-80GB GPUs, despite A100s having a highly optimised software stack.

On inference tests with the Stable Diffusion 3 8B parameter model, Gaudi 2 chips offered similar inference speed to NVIDIA A100 chips using base PyTorch.

However, Stability AI admitted that with TensorRT optimisation, A100 chips produced images 40% faster than Gaudi 2, but Stability AI anticipates Gaudi 2 to outperform A100s with further optimisation. This can be further contented with the upcoming GH200 processors that might be announced at GTC 2024 this month.

Source: Stability AI Blog

Few months back, AMD also claimed that it has surpassed NVIDIA H100 on various performance metrics, but it was later debunked by NVIDIA as it said that AMD also did not include TensorRT optimisation for the test.

Intel has also launched its Gaudi 3 AI accelerator which would make this competition even interesting in the future.

Moreover, Stable Beluga 2.5 70B, Stability AI’s fine-tuned version of LLaMA 2 70B, showcased impressive performance on Intel Gaudi 2 accelerators. Running the PyTorch code out of the box on 256 Gaudi 2 accelerators, Stability AI measured an average throughput of 116,777 tokens/second.

Gaudi 2 demonstrated a 28% faster performance compared to NVIDIA A100 in inference tests with the 70B language model, generating 673 tokens/second per accelerator.

The post Stability AI Claims Intel Gaudi 2 is Faster than NVIDIA H100 appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...