NVIDIA’s New Model ChatQA-2 Rivals GPT-4 in Long Context and RAG Tasks

Researchers at NVIDIA have developed Llama3-ChatQA-2-70B, a new large language model that rivals GPT-4-Turbo in handling long contexts up to 128,000 tokens and excels in retrieval-augmented generation (RAG) tasks.

The model, based on Meta’s Llama3, demonstrates competitive performance across various benchmarks, including long-context understanding, medium-length tasks, and short-context evaluations.

Read the full paper here

The Llama3-ChatQA-2-70B model boasts several key highlights, including its ability to process contexts up to 128,000 tokens, matching the capacity of GPT-4-Turbo. It demonstrates superior performance in RAG tasks compared to GPT-4-Turbo and delivers competitive results on long-context benchmarks extending beyond 100,000 tokens.

Additionally, the model performs strongly on medium-length tasks within 32,000 tokens and maintains effectiveness on short-context tasks within 4,000 tokens.

The researchers employed a two-step approach to extend Llama3-70B’s context window from 8,000 to 128,000 tokens. This involved continued pre-training on a mix of SlimPajama data with upsampled long sequences, followed by a three-stage instruction tuning process.

Evaluation results show that Llama3-ChatQA-2-70B outperforms many existing state-of-the-art models, including GPT-4-Turbo-2024-04-09, on the InfiniteBench long-context tasks. The model achieved an average score of 34.11, compared to GPT-4-Turbo’s 33.16.

For medium-length tasks within 32,000 tokens, Llama3-ChatQA-2-70B scored 47.37, surpassing some competitors but falling short of GPT-4-Turbo’s 51.93. On short-context tasks, the model achieved an average score of 54.81, outperforming GPT-4-Turbo and Qwen2-72B-Instruct.

The study also compared RAG and long-context solutions, finding that RAG outperforms full long-context solutions for tasks beyond 100,000 tokens. This suggests that even state-of-the-art long-context models may struggle to effectively understand and reason over such extensive inputs.

This development represents a significant step forward in open-source language models, bringing them closer to the capabilities of proprietary models like GPT-4. The researchers have provided detailed technical recipes and evaluation benchmarks, contributing to the reproducibility and advancement of long-context language models in the open-source community.

NVIDIA’s New Model ChatQA-2 Rivals GPT-4 in Long Context and RAG Tasks

US and China Strike TikTok Deal, Oracle Steps in as Tech Partner

Google’s Nobel-Winning AI Scientist Says Learning How To Learn Is The Key Skill in the AI Age

Tax Concerns Weigh Heavy on GCC Strategy

Claude Now Available in Xcode for Developers

Latest stories

US and China Strike TikTok Deal, Oracle Steps in as...

Tax Concerns Weigh Heavy on GCC Strategy

Claude Now Available in Xcode for Developers

Google’s Nobel-Winning AI Scientist Says Learning How To Learn Is...

Google Launches Agent Payments Protocol to Standardise AI Transactions

You might also like...

US and China Strike TikTok Deal, Oracle Steps in as Tech Partner

Tax Concerns Weigh Heavy on GCC Strategy

Claude Now Available in Xcode for Developers