Cohere’s Research Lab introduces Maya to Bridge Language Gaps with Multilingual AI

Cohere for AI, a research initiative of Cohere, recently introduced Maya, an open-source multilingual multimodal model built to address gaps in vision-language models’ (VLMs) capabilities, particularly in low-resource languages.

The model improves accessibility and cultural comprehension through improved data quality and toxicity filtering. The model and its datasets are available on GitHub for further development.

“Current datasets often contain toxic and culturally insensitive content, perpetuating biases and stereotypes. To our knowledge, no peer-reviewed research has systematically addressed this,” the researchers stated in the paper on building a multilingual and culturally aware data set.

In the context of the Maya model, “toxicity-free” means removing harmful or offensive content from the training data.

The team created a pretraining dataset of 558,000 image-text pairs, expanding to eight languages, including Arabic, Hindi, and Spanish. This dataset emphasises cultural diversity while mitigating toxicity using tools like Toxic-BERT and LLaVAGuard.

Maya’s performance is notable in multilingual benchmarks. It outperforms existing models in certain tasks and languages, such as Arabic, while offering comparable performance to larger models like PALO-13B. The study also highlights Maya’s effectiveness in tasks like image captioning and visual question answering.

Future plans for Maya include expanding its dataset to include more languages like Bengali and Urdu and improving its instruction-tuning capabilities. Researchers also aim to refine the model’s adaptability for complex reasoning tasks.

Maya’s open-source approach and focus on inclusivity mark a step forward in AI, addressing a critical need for models that understand diverse languages and cultural contexts.

In August, Cohere launched Aya, a multilingual generative model that supported 101 languages, including Indian languages like Hindi and Marathi, with over 50% in lower-resourced categories. Aya outperformed mT0 and BLOOMZ across benchmarks while doubling language coverage. Developed collaboratively by 3,000 researchers in 119 countries, it was open-sourced to address AI dataset scarcity in vernacular languages.

The post Cohere’s Research Lab introduces Maya to Bridge Language Gaps with Multilingual AI appeared first on Analytics India Magazine.

Cohere’s Research Lab introduces Maya to Bridge Language Gaps with Multilingual AI

Indian AI Startup Haber Raises $44M For International Expansion

5 reasons why Google’s Trillium could transform AI and cloud computing – and 2 obstacles

Tech Mahindra Partners with ServiceNow to Deliver One E2E Platform

How to Use LayoutLM for Document Understanding and Information Extraction with Hugging Face Transformers

Tata Communications Unveils Kaleyra AI for Building Conversational AI Agents

Latest stories

Amid Anti-Monopoly Probe, NVIDIA on a Hiring Spree in China

How to Use LayoutLM for Document Understanding and Information Extraction...

Warner Bros Discovery Appoints Anish Agarwal as VP of AI...

Google Crushes ‘12 Days of OpenAI’ With Just 1 Day...

Google Labs just got a redesign. Here are 6 reasons...

You might also like...

Amid Anti-Monopoly Probe, NVIDIA on a Hiring Spree in China

How to Use LayoutLM for Document Understanding and Information Extraction with Hugging Face Transformers

Warner Bros Discovery Appoints Anish Agarwal as VP of AI & Automation