At Google I/O, the tech giant today introduced PaliGemma, a powerful open vision-language model (VLM), and provided a sneak peek into the upcoming Gemma 2, the next generation of their Gemma family of models.
PaliGemma, inspired by PaLI-3 and built on open components, including the SigLIP vision model and the Gemma language model, is designed for class-leading fine-tune performance on a wide range of vision-language tasks.
These tasks include image and short video captioning, visual question answering, understanding text in images, object detection, and object segmentation.
Google is providing both pre-trained and fine-tuned checkpoints at multiple resolutions, as well as checkpoints specifically tuned to a mixture of tasks for immediate exploration.
PaliGemma is available through various platforms and resources, including free options like Kaggle and Colab notebooks, and academic researchers can apply for Google Cloud credits to support their work.
The release of PaliGemma brings several key benefits, such as multimodal comprehension, a versatile base model for fine-tuning on a wide range of vision-language tasks, and off-the-shelf exploration with a checkpoint fine-tuned on a mixture of tasks for immediate research use.
Several have started experimenting with it already.
I tried it with some plant disease images. It could identify the crop, but it would refuse to detect plant diseases. Found this example quite funny: pic.twitter.com/bASP74bgMn
— Thomas Friedel (@thomascygn) May 14, 2024
Project Navarasa Takes Center Stage at Google I/O
Just a few days ago, we wrote about how Gemma outperformed Meta’s Llama 3 for Indic languages. Today, at Google I/O, India’s Project Navarasa took centre stage, highlighting the use of Gemma, making it accessible for 15 Indic languages.
Google highlighted the success of ‘Project Navarasa,’ a multilingual variant of Gemma for Indic languages developed by Telugu LLM Labs.
Harsh Dhand, head of APAC research partnerships at Google said, “When technology is developed for a particular culture, it won’t be able to solve and understand the nuances of a country like India.”
Project Navarasa leverages Gemma’s powerful tokenizer to enable AI-driven language generation for 15 Indic languages.
“One of Gemma’s features is an incredibly powerful tokenizer which enables the model to use hundreds of thousands of words, symbols and characters across so many alphabets and language systems. This large vocabulary is critical to adapting Gemma to power projects like Navarasa,” said Ramsri Goutham Golla, the co-creator of Navarasa.
“Our biggest dream is to build a model to include everyone from all corners of India,” said Golla, saying that Navarasa is a model trained for Indic languages, and a fine-tuned model based on Google’s Gemma.
He said they built Navarasa to create culturally rooted large language models where people can talk in their native language and receive responses in their native language.
Many developers that AIM spoke to said that Gemma is better than Llama for Indic languages. “Gemma shines compared to the Llama 2 and 3 models,” said Adithya S Kolavi, founder of Cognitive Lab, who built a leaderboard for Indic LLMs.
“Models using Llama 2 extended its tokenizer by 20 to 30k tokens, reaching a vocabulary size of 50-60k. Continuous pre-training is crucial for understanding these new tokens. In contrast, Gemma’s tokenizer initially handles Indic languages well, requiring minimal fine-tuning for specific tasks,” explained Kolavi.
According to Vivek Raghavan, the co-founder of Sarvam AI, Gemma’s powerful tokenizer gives it an advantage over Llama when it comes to Indic Languages. He explained, “The tokenization tax for Indic languages means asking the same question in Hindi costs three times more tokens than in English, and even more for languages like Odiya due to their underrepresentation in these models.”
Meanwhile, OpenAI recently released GPT-4o, an update to their language model that includes a new tokenizer and an extended vocabulary size of 200k tokens, compared to 100k tokens in GPT-4.
This update significantly improved the support for several Indian languages, including Hindi, Gujarati, Marathi, Telugu, Tamil, and Urdu.
Although Gemma 2’s tokenizer limit wasn’t clearly mentioned in the demo, it is stated that the model can handle ‘hundreds of thousands of words, symbols and characters’. In comparison, GPT-4o’s 200k base tokenizer so far outperforms Gemma for Indic and non-English languages in terms of token reduction.
More Power to Gemma
Looking ahead, Google announced the upcoming arrival of Gemma 2, the next generation of Gemma models. Gemma 2 will be available in new sizes for a broad range of AI developer use cases and features a brand-new architecture designed for breakthrough performance and efficiency. Key benefits include class-leading performance, reduced deployment costs, and versatile tuning toolchains.
Now you can try out our Indic Gemma Model Navarasa 2.0 (supports language generation in 15 languages) easily as a chat interface at https://t.co/KFQ6qfWBf0
Ask a question in English and ask it to respond in Hindi, Telugu etc or ask directly in the native language.
Kudos to… pic.twitter.com/YuHniHo5s4— Ramsri Goutham Golla (@ramsri_goutham) April 2, 2024
At this year’s developer conference, Google literally poked fun at OpenAI by making it clear that it is making AI helpful for everyone, not just him or her.
The post Google Unveils Open Vision Language Model, PaliGemma appeared first on Analytics India Magazine.