The open source scoreboard is a parameter that shows how rapidly developers are jumping on the large language models bandwagon. As per the Hugging Face leaderboard in the last one month, different versions of models like Vicuna and Meta’s Llama-2 have been downloaded over several million times.
Meta’s Llama-2, a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters has been downloaded over 30,00,000 times. The large model systems organization (LMSYS) released 17 different Vicuna’s models (7B and 13B) which have a little over a million downloads.
This goes to show that major providers like OpenAI’s biggest threat is not a big tech incumbent but the open-source community. Meta has been surprisingly leading the wave compared to Google and OpenAI who have kept most of their technology behind closed doors or have made it available to specific companies through collaborations.
The point has been raised at several points in the recent past since Meta’s reputation has dramatically changed for the better. One reason why the former companies prefer not letting the public put their hands on their technology is for them it being a product not just an internally developed infrastructure. They are ultimately building this technology with shareholders in mind, not developers.
Meta’s Simon Says
When the initial version of Llama was leaked, Meta sent takedown requests to GitHub and Hugging Face to contain it. However, as the code became widely accessible on the internet, Meta abandoned its attempts. Instead, the company embraced the case and chose it as the path forward — releasing the later models, too.
The force behind the company’s open-source awakening might be its AI chief, Yann LeCun. Although the foundation of AI is firmly rooted in open-source principles, Llama represents a milestone as the first major open-source LLM. Meta’s Simon has been on a spree advocating for open source on the internet and in communities. He has brought the subject to his followers’ attention on a daily basis.
From retweeting about ‘Keep AI open’ to replying to other AI folks on the subject, LeCun does not hesitate. A few hours ago, replying to an MIT professor Max Tegmark, LeCun stated, ‘Like many, I very much support open AI platforms because I believe in a combination of forces: people’s creativity, democracy, market forces, and product regulations.
Altman, Hassabis, and Amodei are the ones doing massive corporate lobbying at the moment.
They are the ones who are attempting to perform a regulatory capture of the AI industry.
You, Geoff, and Yoshua are giving ammunition to those who are lobbying for a ban on open AI R&D.
If…— Yann LeCun (@ylecun) October 29, 2023
Importantly, besides providing access to Llama models Meta has also shared its weights while the other major language models have not. Weights, which represent the parameters acquired by a model during its training, simplifies the development and execution of AI algorithms. In contrast, other GPT models remain accessible solely through application programming interfaces (APIs).
Meta may have surpassed OpenAI and Google. An internal memo of Google had recently surfaced on the web in which a Google AI engineer referred to the open-source community as “a third faction [that] has been quietly eating our lunch.”
Dollar Signs
Even investors talk today about open-source. “If you went back nine months, you did not see strong open alternatives to OpenAI and some of the leading proprietary solutions,” said Unusual Ventures general partner Wei Lien Dang. “There’s been this significant proliferation.”
Clearly, the investors have chosen open-source as the contender to bet on.Even James Currier, a general partner at NFX, has taken note of the noticeable cost-saving advantages of transitioning from closed to open-source models. In his portfolio, one of the companies previously incurred monthly expenses of $150,000 to access a particular model. However, after adopting an open-source alternative, the startup saw reduced operational costs, bringing the monthly expenditure down to $4,000 for the same model.
To an extent Meta and LeCun should be credited for being inclined towards the open-source for language models. As LeCun, a month ago tweeted that AI systems are fast becoming a basic infrastructure. He also noted that historically, basic infrastructure always ends up being open source citing the software infra of the internet, Linux, Apache, and JavaScript browser engines.
Even though companies continue to pour money on technology behind the door, the open-source counterpart has a much higher chance of taking the trophy home.
The post And the AI Winner is Open-Source appeared first on Analytics India Magazine.