When it comes to AI models in different cultures and geographies, it is important for them to be accurate and aware of the context, history, and relevance of the region to serve the communities better. We have seen AI models hallucinate majorly while describing several key aspects of Indian culture in the past.
To ensure that AI models are culturally aware, Vinija Jain, ML leader at Amazon and research fellow at IIT Patna, recently published the paper along with Aman Chadha, Shashank Goswami, and Olena Burda-Lassen, titled ‘How Culturally Aware are Vision-Language Models?’ The paper evaluated the cultural sensitivity of AI in image captioning.
“In terms of the Indian context, we wanted to understand how global models like Gemini and GPT recognise our cultural symbols,” said Jain in an interview with AIM. She collected 1,500 images of different Indian dance forms and foods, and manually captioned them to create the MOSAIC-1.5k dataset, representing India’s rich culture in detail.
While most of this is currently done manually, Jain said that if needed, she would later expand the dataset with synthetic data.
The idea of this project is deeply rooted in Jain’s Indian origin. “What happened is that I really craved that culture,” Jain, who started living in the USA at a young age, said. “I feel like I’ve missed out on a big part of that. And because of that, I’ve been trying to find that community here.”
Another key introduction from the research was the Cultural Awareness Score (CAS), which measures AI models on how well they capture the cultural context in image captions. Even though the current evaluation is in English, Jain emphasised the importance of assessing model performance across various linguistic and cultural contexts, including Indic languages.
Culturally Aware AI Research
Recently, Guneet Singh Kohli, an AI research scientist at GreyOrange, created the Sanskriti Bench. It aims to develop an Indian cultural benchmark to test the increase in Indic AI models. By crafting a benchmark with the help of native speakers from different regions across India, the initiative aims to take into account the country’s cultural diversity.
Jain has now also started working with Kohli for this initiative. “Sanskriti Bench is actually a phenomenal idea and the way Guneet is leading the project is unbelievable,” she said, when asked about the most interesting project she’s come across in recent times.
Similarly, Jain is building Indic-MMLU, which is focused on understanding Indic languages. “Every major LLM is evaluated on MMLU; therefore we wanted to create one for Indic languages as well,” said Jain, highlighting that it is necessary to evaluate all the newly released Indic LLMs on their generalisation capabilities across various domains such as science, literature, and social sciences.
Hoping to release the benchmark by the end of the next month, Jain said that her motivation to work in the Indic language space was her roots in India. “My journey in AI research is deeply rooted in my desire to connect with and contribute to my cultural heritage,” she said.
AI Research is an Inspiration
Jain enrolled at Stanford while working at her job as her passion grew towards NLP, multimodal, and AI research. She also won the Outstanding Paper Award at ENLP 2023 for ‘Counter Turing Test (CT2): AI-Generated Text Detection is Not as Easy as You May Think – Introducing AI Detectability Index (ADI)’.
Jain is also currently co-advising Sriparna Saha at IIT Patna’s AI lab for Indic medical research. The paper, titled ‘M3: Multimodal, Multilingual, Medical Help Assistant’, will be India’s first multilingual medical VLM. The aim of the research is to eventually assist doctors in patient-doctor communication along with translation and visual assistance during diagnosis.
“They’re doing a lot of tremendous work in Indic medical research and are actually collaborating with doctors to help validate the data to avoid hallucination,” said Jain. IIT Patna has been focusing on AI research in the medical field very intensively. Recently, a team led by Aman Chadha released the MedSumm dataset for LLMs and VLMs for medical research.
Apart from this, Jain is also working on creating an inventory of all the impactful Indic AI research, which would include LLM, datasets, benchmarks, frameworks, and even tokenisers.
“The research from India is not only serving as great research in itself, but also as an inspiration,” said Jain.
“When you see someone else building something for the community, it motivates you to help and contributes as a building block for further developments,” she added, talking about the growing push for AI research in India, while companies such as OpenAI and Google expand their base into the Indic AI space.
The post Meet the AI Researcher Building Culturally Aware Vision Language Models appeared first on AIM.