Meet Shivaay, the Indian AI Model Built on Yann LeCun’s Vision of AI

Shivaay

They say creating a foundational model in India is incredibly challenging and blame it on the constraints on computational resources and the unavailability of high-quality data. While we do have plenty of open-source data, the computationally expensive part is pre-training—essentially training the model to predict the next token.

Two Indian engineering students, Rudransh Agnihotri and Manasvi Kapoor, recently launched an AI startup called FuturixAI. At first, the team released Mayakriti, an image-generation platform that created lifelike images. Later, the duo decided to build an AI model that competes with OpenAI’s GPT—and built it from scratch.

“Joint embedding and parameter sharing is not a widely discussed architecture,” Agnihotri told AIM. “We explored various approaches and identified models like Llama 2, Qwen, and Gemma. These models form part of a joint embedding architecture,” Agnihotri said, adding that they drew inspiration from Meta AI chief Yann LeCun’s vision of autonomous machine intelligence.

This is how the team developed Shivaay, an AI model consisting of 4 billion parameters built on this joint embedding architecture, which leverages the three models for data. This unique approach gives the model a knowledge base of all the three models.

For inference, the team leverages NVIDIA A100 80GB GPUs via Google Cloud, which explains the fast response time on the server when AIM tried it. The startup is part of the NVIDIA Inception program, so it is currently offered free of cost.

The model’s API is also available on Futurix’s website.

Agnihotri shared the model details with AIM. Shivaay already outperforms larger state-of-the-art models on benchmarks like MMLU and MMLU-Pro by a remarkable margin of 10-15 points.

The benchmark also shows that Shivaay is excellent at reasoning and mathematical calculations.

Agnihotri claimed that when it comes to Indic use cases, the model was, in fact, better than Krutrim, Sarvam, and others coming up in the market. “The goal is to empower Indian developers and businesses to build their own AI agents and applications without relying on foreign models like GPT,” Agnihotri said.

Scaling the Right Way

The team’s approach addresses the widespread notion in India that GPT models and similar architectures aren’t accessible or effective for local needs. “We aim to change that by offering our service for free, much like how OpenAI initially provided GPT-3.5 at no cost. This allows users to explore and trust the model’s capabilities,” Agnihoti added.

Regarding scaling, the current priority is user acquisition. The team just recently improved the chatbot’s user interface and, in just 15 days, saw a spike in signups, crossing 1500+ users, most of which was achieved via Reddit. “Our goal is to convince users that models like Llama 2, Qwen, and Gemma can be just as good, if not better, than existing solutions [for Indian use cases],” Agnihotri said.

For training beyond these models, Futurix utilised datasets like the GATE and IIT exam questions and answers. Agnihotri claimed that this aligns with chain-of-thought reasoning, which, in turn, aligns with the current paradigm of OpenAI’s o1 and GPT-4o. It enables users to test step-by-step responses and validate logical reasoning.

Agnihotri also said the company aims to allow developers to build AI agents for different vertical tasks.

The company plans to raise more funds soon and release the technical paper for the Indian developer ecosystem. Agnihotri also highlighted how LeCun’s philosophy and open-source approach inspired him.

Agnihotri is a third-year mechatronics engineering student from Delhi Skill and Entrepreneurship University. Kapoor is currently in his second year of pursuing electronics and communication engineering, specialising in AI and ML, at Netaji Subhas University of Technology.

Agnihotri was once a JEE aspirant who couldn’t make it to an IIT, but that setback didn’t lessen his love for math. Now, with FuturixAI and Quantum Works, the young founder aims to push forward research in the AI field in India, using research in math and physics.

“Google has the capability, dataset and compute, but at the same time, we have our own methods that are evolving with time,” he said.

The post Meet Shivaay, the Indian AI Model Built on Yann LeCun’s Vision of AI appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...