Generative AI is evolving beyond the race for larger models, focusing on sovereignty, data ownership, and cultural alignment. For India, where multilingual diversity defines daily life, the challenge lies in building AI that reflects these realities while remaining scalable and cost-efficient.
The answer may lie in BharatGen, a consortium-led effort to create multilingual and multimodal AI that is sovereign, frugal, and rooted in India’s priorities.
At Cypher 2025, Ganesh Ramakrishnan, professor at the department of computer science and engineering, IIT Bombay, said, “India’s AI opportunity, converting the diversity into a strength by leveraging the similarity across languages, getting back our skilled engineers and researchers to work together.”
The project brings together IITs and other institutions under a not-for-profit structure, combining academic research with practical applications. Initially supported by the Department of Science and Technology, BharatGen recently received a significant boost in the form of a ₹900 crore grant under the IndiaAI Mission.
This whole-of-government approach, with the Ministry of Electronics and IT stepping in alongside earlier support, aims to scale the models towards the trillion-parameter range and enable the creation of agentic systems for Bharat.
As Ramakrishnan explained, this is a deep-frogging opportunity to shift India from being a “use case capital” to an IP producer, while reinforcing privacy and cultural preservation.
Models Born from India’s Context
BharatGen has already released models ranging from 500 million to 7 billion parameters. Among them is Param-1, a 2.9 billion-parameter language model pre-trained from scratch with 33% Indian data, including 25% Hindi.
“We also released several domain-specific models in agriculture, legal, finance, and Ayurveda,” Ramakrishnan said, emphasising the localisation strategy.
The consortium has also launched multimodal systems. The Sooktam family powers text-to-speech, Shrutam focuses on automatic speech recognition, and Patram stands as India’s first 7 billion-parameter document vision-language model.
These systems are intended to serve Indian needs rather than mimic global templates. “This is actually the seat of India’s AI ecosystem, having our feet on the ground through applications, while also ensuring that we are building models which are not just aping the Western models,” Ramakrishnan emphasised.
Applications such as Krishisathi, accessible via WhatsApp, demonstrate how these models can reach ordinary users. From speech-to-speech systems capable of conveying emotion to compact diffusion-based voice models that work with minimal data, BharatGen’s experiments point towards a personalised, inclusive future for Indian AI.
Also Read: BharatGen’s ‘Recipe’ for Building a Trillion Parameters Indic Model
Research, Sovereignty, and Scaling Ahead
Research is central to BharatGen’s approach, with over 15 papers published in top-tier venues within a year. The consortium has collected more than 13,000 hours of speech data across Indian regions, embedding fidelity and provenance checks into its data pipelines.
Ramakrishnan described this as a “virtuous cycle” of recipes and indigenous benchmarks, ensuring models evolve from robust foundations.
Training challenges remain formidable, with even mid-sized models requiring hundreds of GPUs over weeks. Yet BharatGen’s frugal philosophy has produced compact multilingual architectures that perform competitively on benchmarks.
The recent government funding promises to accelerate this trajectory. With resources to train much larger models, the project can now aim for trillion-parameter systems, speech agents capable of handling multilingual tasks, and multimodal document models for domains such as governance, healthcare, and finance.
At its core, BharatGen is a strategic exercise in sovereignty. By embedding knowledge-driven components, focusing on explainability, and leveraging linguistic similarities across Indian languages, the initiative seeks to create AI that is not only technically strong but also aligned with India’s cultural and national priorities.
As Ramakrishnan concluded, it is about turning diversity into strength and laying the foundation for India to lead, not follow, in the age of generative AI.
Also Read: How BharatGen Took the Biggest Slice of IndiaAI’s GPU Cake
The post BharatGen and the Pursuit of Sovereign, Scalable AI for India appeared first on Analytics India Magazine.