AWS Launches Trainium3 UltraServers, Gives a Peek Into Trainium4

At re: Invent 2025,AWS announced the general availability of its new Amazon EC2 Trn3 UltraServers, powered by the Trainium3 chip built on 3nm technology, to help customers train and deploy AI models faster and at lower cost.

The company said the new servers deliver up to 4.4x more compute performance, 4x greater energy efficiency, and almost 4x more memory bandwidth compared to the previous Trainium2 generation. Each UltraServer can scale up to 144 Trainium3 chips, offering as much as 362 FP8 petaflops of compute.

Trainium3 follows AWS’s earlier deployment of 500,000 Trainium2 chips in Project Rainier, created with Anthropic and described as the world’s largest AI compute cluster.

AWS also revealed early details of Trainium4, expected to deliver at least 6x the processing performance in FP4, along with higher FP8 performance and memory bandwidth. The next-generation chip will support NVIDIA NVLink Fusion interconnects to operate alongside NVIDIA GPUs and AWS Graviton processors in MGX racks.

AWS has already deployed more than 1 million Trainium chips to date. The company says the latest performance improvements translate to faster training and lower inference latency. In internal tests using OpenAI’s GPT-OSS open-weight model, Trn3 UltraServers delivered three times higher throughput per chip and four times faster response times compared to Trn2 UltraServers.

Companies including Anthropic, Karakuri, Metagenomi, NetoAI, Ricoh and Splash Music are already reporting reduced training and inference costs up to 50% in some cases. AWS said its Bedrock service is already running production workloads on Trainium3.

Decart, which focuses on real-time generative video, said it has achieved 4x faster frame generation at half the cost of GPUs on Trainium3. AWS noted that such capabilities could support large-scale interactive applications.

The UltraServers are supported by an upgraded networking stack, including the new NeuronSwitch-v1, which provides twice the internal bandwidth, and a revised Neuron Fabric that brings inter-chip latency below 10 microseconds. The company said this reduces bottlenecks in distributed training and inference, especially for workloads such as agentic systems, mixture-of-experts architectures and reinforcement learning.

UltraClusters 3.0 can connect thousands of the new servers, scaling to as many as one million Trainium chips—10 times the previous generation. AWS said this level of scale enables training multimodal models on trillion-token datasets and serving millions of concurrent users.

The post AWS Launches Trainium3 UltraServers, Gives a Peek Into Trainium4 appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...