Of late, Prime Minister Narendra Modi has been using Bhashini to address gatherings in several Indian languages with elan. During his speech at the Kashi Tamil Sangamam event in Varanasi on December 18, 2023, Modi used Bhashini to translate his speech from Hindi to Tamil. Most recently, the PM addressed students during ‘Pariksha Pe Charcha’ using Bhashini. The list goes on.
“We are a government service being built under government funding,” said Bhashini chief Amitabh Nag, in an exclusive interview with AIM, steering clear of politicising the conversation.
He believes that his work belongs to everyone, and is not restricted to any political party. “Throughout one’s life, regardless of daily endeavours, individuals should not feel pressured to learn another language,” said Nag, expressing a strong desire to eliminate the language barrier in this country.
Enter Bhashini
Bhashini has been getting a lot of attention over the past few months, and is extensively used by a variety of users from software developers, educational institutions to government agencies.
Bhashini offers a range of services, with APIs available on the National Platform for Language Technology. These APIs provide services such as machine translation, speech-to-text, and text-to-speech. “This language technology hub provides modern service APIs to various enterprises. It receives about 100,000 hits per day, and that is where customers might have their own applications,” said Nag.
“The second [aspect] is the Bhashini app, which is targeted towards end-users looking to translate either text or have voice-to-voice discussions,” added Nag.
Bhashini has been constantly adding new features to the app. Recently, it introduced the OCR feature, called SCENE, in its beta version, allowing users to seamlessly extract text and enhance accessibility along with the Browse feature facilitating effortless website translation.
“Additionally, we have another web service called Anuvaad, which handles translation,” said Nag. Anuvaad, much like Google Translate, is a web and mobile application that enables users to translate text between 22 Indian languages and English.
Fuels LLMs & GenAI Jobs
“Currently, a majority of the foundational models are monolingual. We are contributing to the LLM ecosystem by introducing voice modality and multilingual features,” said Nag. Notably, the recently released Indic LLMs, including Sarvam AI’s OpenHathi, Tech Mahindra’s Project Indus, and BharatGPT, are all leveraging Bhashini’s assistance for training datasets.
Last year, Bhashini released IndicTrans2, an open-source, transformer-based multilingual neural machine translation (NMT) model that facilitates high-quality translation across all 22 scheduled Indian languages. “Anyone and everyone building LLMs to serve the Indian market can get in touch with us, and we are open to partnering with them,” said Nag.
Furthermore, generative AI has led to the rise of data labeling and data annotation jobs, creating a demand for high-quality labeled data to train and improve machine learning models, as mentioned by Nag.
“We have approximately 200 translators working in the field and collecting digital data. We have established a straight integrated pipeline in one of our research institutes at IIT Madras, in collaboration with AI4 Bharat. Here, data is collected, curated, annotated, and labeled to train AI models. We also have a specially funded mechanism to create the digital data,” said Nag.
Moreover, Nag told AIM that Bhashini is collaborating with Karya, one of the world’s first data cooperatives that offers labelling and annotation services. Karya is known for constructing datasets for firms like Microsoft and Google, which are used in AI models for education, healthcare, and other services.
“We lack digital data on low-resource languages, such as Bodo or Sindhi. For these languages, we approach individuals proficient in both Bodo and English,” he said, explaining that they help create a training dataset by providing parallel text in English corresponding to the content written in Bodo.
Bhashini launched a crowdsourcing initiative to collect voice and text data in multiple Indian languages called Bhasha Daan. “It’s performing well, but not meeting our initial expectations. We plan to run a campaign to build this up further,” said Nag, when asked about its status.
“Bhashini will definitely change lives because people will be more collaborative, cooperative, and innovative, without the burden of trying to learn more languages,” concluded Nag.
The post [Exclusive] Bhashini’s Amitabh Nag on Breaking Language Barriers in India appeared first on Analytics India Magazine.