Ever since OpenAI introduced ChatGPT two years ago, the race to build its alternative has been relentless. Companies worldwide are building generative AI models and LLMs to rise above each other. In India, startups and companies have prioritised building AI in Indic languages (22 official languages of India) including Bengali, Assamese, Tamil, Telugu, Sanskrit and Hindi among others, to serve the needs of the whole population.
This includes startups like Sarvam AI, Ola’s Krutrim, Wadhwani AI, and Tech Mahindra or even initiatives like Bhashini and AI4Bharat. Everyone is looking to solve the Indic data and AI problem through their research and initiatives.
While speaking with AIM, Tanuj Bhojwani, the head of People+ai, explained the importance of building Indic language models. “If I put a gun to your head and tell you that you cannot use Google from next month, you will maybe push back for one or two months but then start coughing up,” Bhojwani said that everyone is accustomed to the idea of going to this magic box, typing what they think, and getting the results.
Most of the Indian population is yet to experience that type of internet because of the country’s low literacy rate. Bhojwani said that the same people are going to access the new internet multimodally, through voice and pointing the camera at things.
Building AI in India is a very different game from the West. “If you look at how much an AI solution could mean to a user, it’s much higher in India with a much larger volume,” he added. In the West, it’s about acquiring enterprise customers who are willing to spend millions of dollars.
But in India, it’s a high-volume, low-value game, where the AI users would not be paying so much. These people would be more comfortable using AI in their own native languages. This solution is at the population scale. For India to flourish in AI, it needs to create models that understand India’s linguistic nuances and cultural complexities.
What’s the Moat for Indic AI Companies?
The Indic data problem has been on every Indian AI researcher’s mind. One of the primary challenges in developing AI for Indian languages is the scarcity of high-quality data. Unlike English, which has a vast amount of digital content available, Indian languages lack sufficient natural data to train AI models.
That is what everyone, including the western AI companies, is working on. Even OpenAI is slightly interested in working with Indic data. “Just in terms of model, the moat is generally less,” Vishnu Subramanian, the founder of Jarvis AI Labs, told AIM. He added that the moat usually stands in building AI use cases as it is easier to go up that supply chain, rather than going down.
The idea of ‘Adbhut India’ or the ‘AI use case capital of the world’, stands here. “If I had to choose one of them first, I would choose the use cases,” said Bhojwani.
“OpenAI may not be interested in solving some of the problems [in India] because of their lack of capability, empathy, and understanding of the country,” Subramanian said, adding that the market is also very small for OpenAI or Anthropic to venture into.
Subramanian explained that OpenAI did not build GPT-4 on day one, it took them years and several iterations to reach that level. The same would go for Indic language models built by Indian AI companies.
“We should definitely be building the models, and it should be a long-term goal, and for now use case driven is readily doable,” added Subramanian, to which Bhojwani said, “Being one or two generations behind the SOTA models is still good enough.”
The long-term vision of all AI companies is the same—to have indigenously developed AI models. Given the network and access to resources that the Indian AI companies have right now, the next best move would be to build a GPT-2 level model instead of competing with the West.
This is similar to what Naveen Rao, the VP of generative AI at Databricks, told AIM, “You’ve got to do something better than they [OpenAI] do. And, if you don’t, and it’s cheap enough to move, then why would you use somebody else’s model? So it doesn’t make sense to me to just try to be ahead unless you can beat them,” he added.
Building models is getting cheaper and catching up with SOTA a few years later would be astronomically cheaper. “What is the hurry?” asked Bhojwani, explaining that it is better to solidify the market that could sustain the models. “For a constrained set of resources, where would you rather apply them,” he added and said that it is good to build models, but if you had to pick what to do first, defining the use cases is more important.
AI for Bharat
Earlier, speaking with AIM, Vivekananda Pani, the CTO and co-founder of Reverie Language Technologies, said that building AI in native languages was essential. “We started in an era when there was absolutely zero Indian language data in the digital media,” he recalled, highlighting the progress made from their first speech model using only 100 hours of data to more recent models utilising at least 10,000 hours.
“In India, we still have less than 7% of people who are fluent in English,” Pani explained. He said that there is definitely a need to build an AI model in Indian languages and not just rely on the models of the West.
This is what Wadhwani AI CEO Shekar Sivasubramanian meant when he spoke to AIM about building AI for India and AI for Bharat. He said when rolling out software for farmers or daily wage earners, we must consider that they may be using the most expensive piece of equipment they’ve ever owned, and the software must be intuitive and simple to use.
Similarly, Mohandas Pai, the founder of Aarin Capital and former CFO of Infosys, told AIM that it is important for companies to build vertical applications of generative AI models instead of focusing on building a horizontal ChatGPT-like solution. “They have got their models, they are implementing them and they would be the major players in this market,” said Pai.
This approach has been working well for Indian AI companies, inspired by Nandan Nilekani’s idea of building ‘Adbhut India’, or, in other words, “the AI use case capital of the world.” When it comes to building AI for Bharat, Indic AI is the way forward. That is what most startups are striving to do in India.
The post Why are Indian AI Companies Obsessed with Indic Language Models? appeared first on AIM.