AI4Bharat, an AI research lab incubated at IIT Madras, has released Airavata, an instruction-tuned model for Hindi. The model has been built by fine-tuning Sarvam AI’s OpenHathi, with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks, it said in a blog post.
Along with Airavata, AI4Bharat has also released the instruction tuning datasets used for the model to enable more innovation in the IndicLLM space.
“We rely on human-curated, license-friendly instruction-tuned datasets to build ‘Airavata’. We do not use data generated from proprietary models like GPT-4 etc. We think this is a more sustainable way of building instruction-tuned models at scale for most Indic languages, where relying on distilled data from commercial models would increase costs and restrict their free usage in downstream applications due to licensing restrictions,” it said.
Effective performance of LLMs relies significantly on high-quality instruction tuning datasets. Unfortunately, there is a scarcity of diverse datasets available for Hindi.
AI4Bharat’s approach in developing Airavata involves translating well-constructed English-supervised instruction-tuning datasets into Hindi. For this translation task, we leverage IndicTrans2, a state-of-the-art open-source machine translation model specifically designed for Indian languages, it added.
Previously, AI4Bharat introduced Chitralekha, an open-source AI-powered video transcreation platform developed in partnership with EkStep.
It has an integrated workforce management system, which enables end-to-end transcreation of a video from one language to another through the stages of transcription, translation and voice-over for the translated language.
Earlier this month, AI4Bharat announced the hiring process for its AI resident (and associates) programme for 2024-25. This year-long, pre-doctoral programme focuses on intensive work in NLP, speech, and vision projects.
The post AI4Bharat Releases Airavata: An Instruction-tuned Hindi LLM appeared first on Analytics India Magazine.