NVIDIA Launches Hymba, its New Hybrid Architecture for Small LLMs

NVIDIA has unveiled Hymba-1.5B-Base, a small-scale language model that integrates transformer attention mechanisms with state space models (SSMs). This hybrid architecture aims to enhance efficiency in natural language processing tasks.

Pavlo Molchanov, scientist and research manager at NVIDIA took to X to announce this latest development.

Sharing our team’s latest work on Hymba – an efficient small language model with hybrid architecture.
Tech report: https://t.co/koThe3a6Ja
Discover the tradeoff between Mamba and Attention, how they can be combined, how attention sink and forced-to-attend phenomena can be… pic.twitter.com/brI69sAiFy

— Pavlo Molchanov (@PavloMolchanov) November 22, 2024

The model uses a dual structure with attention heads for precise recall and SSM heads for efficient context summarisation.

It adds learnable meta-tokens at the start of input sequences to store key information and reduce unnecessary attention demands. To improve memory and computation efficiency, it includes cross-layer key-value sharing and partial sliding window attention.

The research paper “Hymba: A Hybrid-head Architecture for Small Language Models” explores the model’s design, performance, and applications in detail.

Hymba Outperforms Llama-3.2

In a controlled study comparing various architectures under identical settings, Hymba-1.5B-Base demonstrated significant advantages.

It outperformed all publicly available models under 2 billion parameters and surpassed Llama-3.2-3B with a 1.32% higher average accuracy, an 11.67-fold reduction in cache size, and a 3.49-fold increase in throughput.

Philipp Schmid, technical lead & LLMs at Hugging Face, commented on the development, stating, “Hymba outperforms other small LLMs like Meta 3.2 or SmolLM v2 being trained on only 1.5T Tokens.”

Molchanov commented, “I’m not sure if we should be proud of 1.5T training. The reason is we want to move quickly; in the next two weeks, somebody will be even better.”

NVIDIA has also provided a setup script to facilitate environment configuration, supporting CUDA versions 12.1 and 12.4.

But, Caution!

NVIDIA acknowledges that the model was trained on internet data, which includes toxic language, unsafe content, and societal biases. As a result, the model may reflect these biases, generate toxic responses to toxic prompts, or produce inaccurate or irrelevant text even with neutral prompts.

Users should set the batch size to one during generation, as the current setup doesn’t fully support padding meta tokens with sliding window attention. However, any batch size works for training and pre-filling.

The company highlights the importance of shared responsibility in creating trustworthy AI and has set ethical guidelines for its development. Users are advised to use the model responsibly and at the same time, stay mindful of its limitations.

The post NVIDIA Launches Hymba, its New Hybrid Architecture for Small LLMs appeared first on Analytics India Magazine.

NVIDIA Launches Hymba, its New Hybrid Architecture for Small LLMs

Hymba Outperforms Llama-3.2

But, Caution!

Latest stories

Nvidia reportedly plans to launch new AI chip designed for...

Google publicizes newest AI American Infrastructure Acadmey cohort

Microsoft shares $500M in AI financial savings internally days after...

YouTube prepares crackdown on ‘mass-produced’ and ‘repetitive’ movies, as concern...

iMerit believes better-quality knowledge, no more knowledge, is the way...

You might also like...

Nvidia reportedly plans to launch new AI chip designed for China

Google publicizes newest AI American Infrastructure Acadmey cohort

Microsoft shares $500M in AI financial savings internally days after slicing 9,000 jobs