‘AI Platforms will Control What Everybody Sees,’ Says Meta’s AI Chief Yann LeCun

Open Source AI Platforms

“Eventually all our interactions with the digital world will be mediated by AI assistants. This means that AI assistants will constitute a repository of all human knowledge and culture; they will constitute a shared infrastructure like the internet is today,” said Yann LeCun, one of the three godfathers of AI, in his talk at GenAI Winter School recently.

He urged platforms to be open-source and said that we cannot have a small number of AI assistants controlling the entire digital diet of every citizen across the world, taking a dig at OpenAI and a few other companies without naming them.

“This will be extremely dangerous for diversity of thought, for democracy, for just about everything”, he added.

There have been examples galore of things going wrong and biases taking the center stage when only a few companies have the power and control to manufacture the ‘cultural understanding’ for the entire world. They either tend to ignore different cultures or end up overcompensating in ticking off the ‘diversity’ check box.

Case in point: Google’s extra-‘woke’ chatbot Gemini that tried to forcefully inject diversity into pictures with a disregard for historical context. “It’s DEI gone mad,” exclaimed the notably agitated users.

Source: X

We Need Open-Source Base Models

“So what we need is not one AI assistant, we need base models like Llama 2, Mistral, and Gemma that can be fine-tuned by anybody so that, for example, it speaks Arabic and understands the culture of Morocco and knows everything about Marrakech,” said LeCun.

He emphasised that those platforms must be open because we need a high diversity of AI assistants the same way we need a high diversity of the press so that we have no echo chambers and have multiple sources of information.

Currently, we are seeing a multitude of AI models flourish. From farming and healthcare, to education and entertainment, AI is conquering every field. And it doesn’t stop at chat-based solutions. Now, with advancements like voice-first in empathetic voice interface models like Hume AI, our interactions with these assistants are only getting better.

Soon, as LeCun said, this will give birth to a time when “we’re not going to be using search engines. Instead, when it comes to interacting with digital content, we’re basically going to be using our AI assistants. We’ll ask them questions, and they’ll provide the answers. They’ll assist us in our everyday lives”.

This further highlights the need to prevent monopoly in the production of these assistants. If it is through them that we are going to see and interact with the world, then there should be models as diverse as the world we live in. And, thanks to open-source base models, we are already seeing that happen.

Democratising AI Wholeheartedly

India is emerging as an open-source AI champion. From developing Devika, the open source alternative to Devin, and creating Ambari, a bilingual Kannada model built on top of Llama 2, to Telugu LLM Labs and Odia Llama, AI models in Indic languages are the biggest focus of the open source AI developers in India.

India’s vast diversity in languages, cultures, and populations means that a one-size-fits-all approach would not work here. Instead, open source allows for the creation of customised versions tailored to specific user groups, locations, regions, religions, etc., without the need to start from scratch for every individual use case.

Sarvam AI is building models such as OpenHathi on top of Llama. Another notable mention is the Indian agri-tech startup KissanAI, which unveiled Dhenu Vision LLMs for crop disease detection.

BharatGPT unveiled Hanooman, a new suite of Indic GenAI models. The makers said, “We don’t want it to be like ChatGPT, which suffers from the ‘I’m God and I know everything’ syndrome.” The primary focus areas are healthcare and education. Tech Mahindra’s foundational model Project Indus, is an initiative to challenge OpenAI.

Recent developments in other parts of the world like South Korean AI company Kakao Brain’s projects like KoGPT, a large-scale language model for Korean, and Karlo, an image generation model also paint a promising picture. The company aims to contribute to the AI community with open-source projects.

Tokyo-based Sakana AI, reported to be Japan’s first AI startup, is another such example.

All these developments from different regions of the world, involving different languages, cultures, etc., paint an optimistic outlook for LeCun’s suggestion that virtual assistants and AI platforms must be open-source, “Otherwise our culture will be controlled by a few companies on the West Coast of the US or in China.”

“What’s important now is that a lot of governments are thinking about the benefits and dangers of AI. Some of them are thinking that AI is too dangerous to put in the hands of everyone and they’re trying to regulate it and basically make open source AI illegal; regulate it out of existence. I think that’s extremely dangerous for the future of humanity,” LeCun said.

He emphasised that “it’s too dangerous to have AI controlled by a small number of people”.

Still a Lot to Improve

While moving towards such a future, as envisioned by LeCun, we should remember that LLMs and AI assistants can also become the harbinger of chaos and increase the amount of misinformation on the internet massively.

The GPT-4 paper reads, “Novel capabilities often emerge in more powerful models”, and highlights how the model can become “agentic”, meaning it can independently develop and pursue goals not originally programmed during its training.

“The model isn’t accurate in admitting its limitations,” which is a crucial point to note for every single user as well, said the paper.

Talking about the scope of misinformation, Air Canada’s chatbot goof-up serves as a warning sign. In that incident, according to a passenger’s screenshot of a conversation with Air Canada’s chatbot, the passenger was told he could apply for the refund “within 90 days of the date your ticket was issued” by completing an online form.

Source: X

However, when he applied for a refund, Air Canada said bereavement rates did not apply to completed travel and pointed to the bereavement section of the company’s website. Finally, the company was found liable for its chatbot’s misleading advice.

So, while envisioning a future in which most consumer access to the internet will be agents acting for consumers doing tasks and fending off marketers and bots. And, where tens of billions of agents on the internet will be normal, as posted by Vinod Khosla, we should also ensure that these agents are intelligent and reliable; built by diverse companies, based on diverse data, and cater to the needs of a diverse population.

The post ‘AI Platforms will Control What Everybody Sees,’ Says Meta’s AI Chief Yann LeCun appeared first on Analytics India Magazine.

Ola Krutrim to Launch AI Cloud Next Week

Ola chief Bhavish Aggarwal has announced that Krutrim is set to open its AI cloud, operating from India, to developers starting next week. Aggarwal said that the chat app, Krutrim, relies on this proprietary AI cloud infrastructure.

Our @Krutrim chat app runs on our own AI cloud based in India. Planning to open the AI cloud up for developers next week.
If you’re an AI developer, do let us know in comments if you want some free credits! Tell us about your company and what your AI use case is.

— Bhavish Aggarwal (@bhash) April 18, 2024

Aggarwal also invited AI developers to express their interest in the comments section, offering free credits as an incentive. Developers are encouraged to share details about their companies and the specific AI use cases they are working on.

“If you’re an AI developer, do let us know in the comments if you want some free credits! Tell us about your company and what your AI use case is,” wrote Aggarwal.

This comes after Aggarwal recently revealed that Krutrim has achieved a major breakthrough by operating on its independent cloud infrastructure, signifying a move away from reliance on external cloud providers like AWS or Azure. He emphasised ongoing efforts by the Krutrim team to enhance both the model itself and its infrastructure.

Ola Krutrim recently announced that it is set to launch its standalone mobile app. The app comes with significant advancements, including a notable decrease in the Time it Takes to Generate the First word of a response (TTFT), from 22 seconds at its initial launch to a swift 0.3 seconds presently, with further enhancements anticipated.

Aggarwal also hinted at forthcoming improvements in an upcoming detailed blog post.

Intel recently disclosed that Ola Krutrim is leveraging Intel Gaudi 2 clusters for pre-training and fine-tuning its foundational models, boasting industry-leading price/performance ratios across ten languages.

Moreover, Krutrim is actively pre-training an expanded foundational model on Intel Gaudi 2 clusters, further elevating its AI capabilities.

A few days ago, Krutrim announced its partnership with Databricks to improve its foundational language model, particularly for Indian languages, aiming to enhance AI solutions in India.

“The Krutrim model was launched using our platform,” said Naveen Rao, VP of generative AI at Databricks, during an exclusive interview with AIM.

Ola Krutrim has been quite obsessed with developing its own foundational model from scratch, despite rumours that it is being built on fine-tuned models such as Llama-2, Mistral, Claude-3 or even the most recent, DBRX.

Launched in December last year, Krutrim has been lauded as “India’s first full-stack AI” solution, showcasing prowess in understanding and generating content across multiple Indian languages, including Marathi, Hindi, Bengali, Tamil, Kannada, Telugu, Odia, Gujarati, and Malayalam, with claims of superiority over GPT-4 in Indic languages.

The post Ola Krutrim to Launch AI Cloud Next Week appeared first on Analytics India Magazine.

Infosys Acquires German R&D Services Provider, in-tech

infosys

Infosys has announced its acquisition of in-tech, a top Engineering R&D services provider focused on the German automotive industry. This strategic move enhances Infosys’ capabilities in the engineering R&D space and strengthens its commitment to clients navigating digital engineering.

Based in Germany, in-tech specialises in digitisation in automotive, rail transport, and smart industry sectors. It offers solutions in e-mobility, autonomous driving, and electric vehicles. in-tech’s client portfolio includes marquee German original equipment manufacturers (OEMs) and a multidisciplinary team of 2,200 people across multiple countries.

Read: Infosys Feels Good About Its Work with Generative AI

Dinesh Rao, EVP and Co-Delivery Head at Infosys, said the collaboration bolsters Infosys’ capabilities in automotive innovation and software-defined vehicles. “Infosys continues to strengthen its Engineering R&D leadership with decades of experience in digital engineering. Together with in-tech, Infosys Topaz, an AI-first set of services, solutions and platforms, and recently acquired InSemi’ semiconductor’s expertise, we have successfully created deeper capabilities for the next phase of automotive innovation in the arena of software defined vehicles. We are excited to welcome in-tech and its leadership team into the Infosys family.”

Jasmeet Singh, EVP and Global Head of Manufacturing, emphasised the partnership’s potential to bring high-quality, innovative products to market quickly.

in-tech CEO Tobias Wagner noted the partnership’s potential for unprecedented growth and value for clients. “Over the past 22 years, we have created an impressive company history, characterised by organic growth, strategic acquisitions and high profitability. This strategic partnership with Infosys represents a decisive turning point for us: It opens up unprecedented growth opportunities, and also adds tremendous value to our offering for our clients. Together we now cover the entire end-to-end process, a step that is crucial to fully meet our customers’ needs. With access to more talent and expertise, we gain incredible strength and scale in our delivery capability, enabling us to successfully implement even more ambitious projects.”

The post Infosys Acquires German R&D Services Provider, in-tech appeared first on Analytics India Magazine.

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Llama 3

After teasing the world with a glimpse on Microsoft Azure, Meta has finally dropped Llama 3, the latest generation of its LLM that offers SOTA performance and efficiency.

Click here to check out the model on GitHub.

The model is available in 8B and 70B parameter versions and has been trained on over 15 trillion tokens, making it seven times larger than Llama 2’s dataset. Llama 3 provides enhanced reasoning and coding capabilities, and its training process is three times more efficient than its predecessor.

The models are now also available on Hugging Face.

Meta is also training a model with more than 400 billion parameters which Mark Zuckerberg said in a Reel on Instagram is going to be the top performing model out there.

The 7B models outperforms Gemma and Mistral on all benchmarks and the 70B model outperforms Gemini Pro 1.5 and Claude 3 Sonnet.

Llama 3 models will soon be accessible on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake. Additionally, the models will be compatible with hardware platforms provided by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.

In addition to the model, Meta has incorporated its latest models into Meta AI, now powered by Llama 3, and expanded its availability across more countries. Meta AI is accessible through Facebook, Instagram, WhatsApp, Messenger, and the web, enabling users to accomplish tasks, learn, create, and engage with their interests.

Additionally, users will soon have the opportunity to experience multimodal Meta AI on Ray-Ban Meta smart glasses.

Meta AI is powered by Llama 3 and is now available in 13 new countries. It includes improved search capabilities and innovative web experiences. The latest updates in image generation on Meta AI allow users to create, animate, and share images with a simple text prompt.

The model uses a 128K-token vocabulary for more efficient language encoding, leading to significantly improved performance. To boost inference efficiency, grouped query attention (GQA) is implemented in both the 8B and 70B parameter models. The models were trained on sequences of 8,192 tokens, with masking to maintain document boundaries.

Llama 3’s training data consists of over 15 trillion tokens sourced from publicly available data, seven times larger than Llama 2’s dataset. The model was trained on two custom built 24k GPU clusters.

It includes four times more code and over 5% high-quality non-English data spanning 30+ languages, though English remains the most proficient. Advanced data-filtering methods, including heuristic filters and semantic deduplication, ensure top-quality training data.

Here is the sneak preview of the upcoming 400 billion parameter Llama 3 model.

The post Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5 appeared first on Analytics India Magazine.

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Llama 3

After teasing the world with a glimpse on Microsoft Azure, Meta has finally dropped Llama 3, the latest generation of its LLM that offers SOTA performance and efficiency.

Click here to check out the model on GitHub.

The model is available in 8B and 70B parameter versions and has been trained on over 15 trillion tokens, making it seven times larger than Llama 2’s dataset. Llama 3 provides enhanced reasoning and coding capabilities, and its training process is three times more efficient than its predecessor.

The models are now also available on Hugging Face.

Meta is also training a model with more than 400 billion parameters which Mark Zuckerberg said in a Reel on Instagram is going to be the top performing model out there.

The 7B models outperforms Gemma and Mistral on all benchmarks and the 70B model outperforms Gemini Pro 1.5 and Claude 3 Sonnet.

Llama 3 models will soon be accessible on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake. Additionally, the models will be compatible with hardware platforms provided by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.

In addition to the model, Meta has incorporated its latest models into Meta AI, now powered by Llama 3, and expanded its availability across more countries. Meta AI is accessible through Facebook, Instagram, WhatsApp, Messenger, and the web, enabling users to accomplish tasks, learn, create, and engage with their interests.

Additionally, users will soon have the opportunity to experience multimodal Meta AI on Ray-Ban Meta smart glasses.

Meta AI is powered by Llama 3 and is now available in 13 new countries. It includes improved search capabilities and innovative web experiences. The latest updates in image generation on Meta AI allow users to create, animate, and share images with a simple text prompt.

The model uses a 128K-token vocabulary for more efficient language encoding, leading to significantly improved performance. To boost inference efficiency, grouped query attention (GQA) is implemented in both the 8B and 70B parameter models. The models were trained on sequences of 8,192 tokens, with masking to maintain document boundaries.

Llama 3’s training data consists of over 15 trillion tokens sourced from publicly available data, seven times larger than Llama 2’s dataset. The model was trained on two custom built 24k GPU clusters.

It includes four times more code and over 5% high-quality non-English data spanning 30+ languages, though English remains the most proficient. Advanced data-filtering methods, including heuristic filters and semantic deduplication, ensure top-quality training data.

Here is the sneak preview of the upcoming 400 billion parameter Llama 3 model.

The post Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5 appeared first on Analytics India Magazine.

Meta releases Llama 3, claims it’s among the best open models available

Meta releases Llama 3, claims it’s among the best open models available Kyle Wiggers 7 hours

Meta has released the latest entry in its Llama series of open source generative AI models: Llama 3. Or, more accurately, the company has open sourced two models in its new Llama 3 family, with the rest to come at an unspecified future date.

Meta describes the new models — Llama 3 8B, which contains 8 billion parameters, and Llama 3 70B, which contains 70 billion parameters — as a “major leap” compared to the previous-gen Llama models, Llama 2 8B and Llama 2 70B, performance-wise. (Parameters essentially define the skill of an AI model on a problem, like analyzing and generating text; higher-parameter-count models are, generally speaking, more capable than lower-parameter-count models.) In fact, Meta says that, for their respective parameter counts, Llama 3 8B and Llama 3 70B — trained on two custom-built 24,000 GPU clusters — are are among the best-performing generative AI models available today.

That’s quite a claim to make. So how is Meta supporting it? Well, the company points to the Llama 3 models’ scores on popular AI benchmarks like MMLU (which attempts to measure knowledge), ARC (which attempts to measure skill acquisition) and DROP (which tests a model’s reasoning over chunks of text). As we’ve written about before, the usefulness — and validity — of these benchmarks is up for debate. But for better or worse, they remain one of the few standardized ways by which AI players like Meta evaluate their models.

Llama 3 8B bests other open source models like Mistral’s Mistral 7B and Google’s Gemma 7B, both of which contain 7 billion parameters, on at least nine benchmarks: MMLU, ARC, DROP, GPQA (a set of biology-, physics- and chemistry-related questions), HumanEval (a code generation test), GSM-8K (math word problems), MATH (another mathematics benchmark), AGIEval (a problem-solving test set) and BIG-Bench Hard (a commonsense reasoning evaluation).

Now, Mistral 7B and Gemma 7B aren’t exactly on the bleeding edge (Mistral 7B was released last September), and in a few of benchmarks Meta cites, Llama 3 8B scores only a few percentage points higher than either. But Meta also makes the claim that the larger-parameter-count Llama 3 model, Llama 3 70B, is competitive with flagship generative AI models including Gemini 1.5 Pro, the latest in Google’s Gemini series.

Meta Llama 3

Image Credits: Meta

Llama 3 70B beats Gemini 1.5 Pro on MMLU, HumanEval and GSM-8K, and — while it doesn’t rival Anthropic’s most performant model, Claude 3 Opus — Llama 3 70B scores better than the weakest model in the Claude 3 series, Claude 3 Sonnet, on five benchmarks (MMLU, GPQA, HumanEval, GSM-8K and MATH).

Meta Llama 3

Image Credits: Meta

For what it’s worth, Meta also developed its own test set covering use cases ranging from coding and creating writing to reasoning to summarization, and — surprise! — Llama 3 70B came out on top against Mistral’s Mistral Medium model, OpenAI’s GPT-3.5 and Claude Sonnet. Meta says that it gated its modeling teams from accessing the set to maintain objectivity, but obviously — given that Meta itself devised the test — the results have to be taken with a grain of salt.

Meta Llama 3

Image Credits: Meta

More qualitatively, Meta says that users of the new Llama models should expect more “steerability,” a lower likelihood to refuse to answer questions, and higher accuracy on trivia questions, questions pertaining to history and STEM fields such as engineering and science and general coding recommendations. That’s in part thanks to a much larger data set: a collection of 15 trillion tokens, or a mind-boggling ~750,000,000,000 words — seven times the size of the Llama 2 training set. (In the AI field, “tokens” refers to subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic.”)

Where did this data come from? Good question. Meta wouldn’t say, revealing only that it drew from “publicly available sources,” included four times more code than in the Llama 2 training data set, and that 5% of that set has non-English data (in ~30 languages) to improve performance on languages other than English. Meta also said it used synthetic data — i.e. AI-generated data — to create longer documents for the Llama 3 models to train on, a somewhat controversial approach due to the potential performance drawbacks.

“While the models we’re releasing today are only fine tuned for English outputs, the increased data diversity helps the models better recognize nuances and patterns, and perform strongly across a variety of tasks,” Meta writes in a blog post shared with TechCrunch.

Many generative AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it close to the chest. But training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much. Recent reporting revealed that Meta, in its quest to maintain pace with AI rivals, at one point used copyrighted ebooks for AI training despite the company’s own lawyers’ warnings; Meta and OpenAI are the subject of an ongoing lawsuit brought by authors including comedian Sarah Silverman over the vendors’ alleged unauthorized use of copyrighted data for training.

So what about toxicity and bias, two other common problems with generative AI models (including Llama 2)? Does Llama 3 improve in those areas? Yes, claims Meta.

Meta says that it developed new data-filtering pipelines to boost the quality of its model training data, and that it’s updated its pair of generative AI safety suites, Llama Guard and CybersecEval, to attempt to prevent the misuse of and unwanted text generations from Llama 3 models and others. The company’s also releasing a new tool, Code Shield, designed to detect code from generative AI models that might introduce security vulnerabilities.

Filtering isn’t foolproof, though — and tools like Llama Guard, CybersecEval and Code Shield only go so far. (See: Llama 2’s tendency to make up answers to questions and leak private health and financial information.) We’ll have to wait and see how the Llama 3 models perform in the wild, inclusive of testing from academics on alternative benchmarks.

Meta says that the Llama 3 models — which are available for download now, and powering Meta’s Meta AI assistant on Facebook, Instagram, WhatsApp, Messenger and the web — will soon be hosted in managed form across a wide range of cloud platforms including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM’s WatsonX, Microsoft Azure, Nvidia’s NIM and Snowflake. In the future, versions of the models optimized for hardware from AMD, AWS, Dell, Intel, Nvidia and Qualcomm will also be made available.

And more capable models are on the horizon.

Meta says that it’s currently training Llama 3 models over 400 billion parameters in size — models with the ability to “converse in multiple languages,” take more data in and understand images and other modalities as well as text, which would bring the Llama 3 series in line with open releases like Hugging Face’s Idefics2.

Meta Llama 3

Image Credits: Meta

“Our goal in the near future is to make Llama 3 multilingual and multimodal, have longer context and continue to improve overall performance across core [large language model] capabilities such as reasoning and coding,” Meta writes in a blog post. “There’s a lot more to come.”

Indeed.

LoReFT: Representation Finetuning for Language Models

LoReFT: Representation Finetuning for Language Models

Parameter-efficient fine-tuning or PeFT methods seek to adapt large language models via updates to a small number of weights. However, a majority of existing interpretability work has demonstrated that representations encode semantic rich information, suggesting that it might be a better and more powerful alternative to edit these representations. Pre-trained large models are often fine tuned to be used for new domains or tasks, and during the fine-tuning process, a single base model can be adapted to a wide variety of tasks even with only small amounts of in-domain data available to the model. However, the process of fine-tuning an entire model is resource-consuming, and expensive, especially for language models with a significantly higher number of size and parameters.

Parameter-efficient fine-tuning or PeFT methods propose to tackle the high costs associated with fine-tuning the whole model by updating only a small amount of the total weights available, a process that helps in reducing training time along with memory usage. What’s more important is that Parameter-efficient fine-tuning or PeFT methods have demonstrated similar performance to finetune in several practical settings. Adapters, a common family of Parameter-efficient fine-tuning or PeFT methods, learn an edit that can be added to an additional set of weights that operate alongside the frozen base model, with recent adapters like LoRA reduce the number of trainable parameters in learned weight updates by using low-rank approximations instead of full-weight matrices when training the adapters.

With previous works demonstrating editing representations might be a better alternative to Parameter-efficient fine-tuning or PeFT methods, in this article, we will be talking about Representation Fine-tuning or ReFT methods that operate on a frozen model, and learn task-specific interventions on hidden representations. This article aims to cover the ReFt or Representation Fine-tuning framework in depth, and we explore the mechanism, the methodology, the architecture of the framework along with its comparison with state of the art frameworks. So let’s get started.

ReFT: Representation Fine-tuning for Language Models

In an attempt to adopt pre-trained language models to new domains and tasks, current frameworks fine-tune these pre-trained language models frequently as with the fine-tuning process implemented, a single base model can be adapted to a variety of tasks even when working with a small amount of in-domain data. Although the fine-tuning process does boost the overall performance, it is an expensive process especially if the language model has a significantly high number of parameters. To tackle this issue, and reduce the associated costs, PeFT or Parameter-efficient fine-tuning frameworks update only a small fraction of the total weights, a process that not only reduces the training time, but also reduces the memory usage, allowing the PeFT frameworks to achieve similar performance when compared to full fine-tuning approaches in practical scenarios. Adapters, a common family of PeFTs, work by learning an edit that can be added to an additional set of weights along with a subset of weights that operate in unison with the base model with frozen weights. Recent adapter frameworks like LoRA and QLoRA have demonstrated that it is possible to train full-precision adapters on top of reduced precision models without affecting performance. Adapters are usually more efficient and effective when compared against other methods that introduce new model components.

A major highlight of current state of the art Parameter-efficient fine-tuning frameworks is that instead of modifying representations, they modify weights. However, frameworks dealing with interpretability have demonstrated that representations encode rich semantic information, suggesting that representations editing might be a better and a more powerful approach when compared to weight updates. This assumption of representations editing being the better approach is what forms the foundation of ReFT or Representation Fine-tuning framework that trains interventions instead of adapting model weights, allowing the model to manipulate a small fraction of all the representations in an attempt to steer model behaviors to solve downstream tasks during inference. ReFT or Representation Fine-tuning methods are drop-in replacements for weight-based PeFT or Parameter-efficient fine-tuning frameworks. The ReFT approach draws inspiration from recent models working with large model interpretability that intervenes on representations to find faithful causal mechanisms, and steers the behavior of the model during inference, and therefore can be seen as a generalization of the representation-editing models. Building on the same, LoReFT or Low-Rank Subspace ReFT is a strong and effective instance of ReFT, and is a parameterization of ReFT that intervenes on hidden representations in the linear space spanned by low-rank projection matrix, and builds directly on the DAS or Distributed Alignment Search framework.

Moving along, contrary to full fine-tuning, the PeFT or Parameter-efficient fine-tuning framework trains only a small fraction of the parameters of the model, and manages to adapt the model to downstream tasks. The Parameter-efficient fine-tuning framework can be classified into three main categories:

  • Adapter-based methods: Adapter-based methods train additional modules like fully-connected layers on top of the pre-trained model with frozen weights. Series adapters insert components between the multilayer perceptron or MLP and LM or large model attention layers, whereas parallel adapters add modules alongside existing components. Since adapters add new components that can not be folded into existing model weights easily, they pose an additional burden during inference.
  • LoRA: LoRA along with its recent variants approximate additive weights during training by using low-rank matrices, and they do not require additional overheads during inference since the weight updates can be merged into the model, and it’s the reason why they are considered to be the current strongest PeFT frameworks.
  • Prompt-based methods: Prompt-based methods add soft tokens that are initialized randomly into the input, and train their embeddings while keeping the weights of the language model frozen. The performance offered by these methods are often not satisfactory when compared against other PeFT approaches, and they also carry a significant inference overhead cost.

Instead of updating the weights, the ReFT framework learns interventions to modify a small fraction of the total representations. Furthermore, recent works on representation engineering and activation steering have demonstrated that adding fixed steering vectors to the residual stream might facilitate a degree of control over pre-trained large model generations without requiring resource-intensive fine-tuning. Other frameworks have demonstrated that editing representations with a learned scaling and translation operation can attempt to match but not surpass the performance offered by LoRA adapters on a wide array of tasks with fewer learned parameters. Furthermore, the success of these frameworks across a range of tasks have demonstrated that representations introduced by pre-trained language models carry rich semantics, although the performance of these models is sub-optimal, resulting in PeFTs to continue as the state of the art approach with no additional inference burden.

ReFT : Methodology and Architecture

To keep the style preservation process simple, the ReFT framework assumes a transformer-based large model as its target model that is capable of producing contextualized representation of sequence of tokens. For a given sequence with n number of input tokens, the ReFT framework first embeds these input tokens into a list of representations following which the m layers compute the list of hidden representations successively as a function of the previous list of hidden representations. Each hidden representation is a vector, and the language model uses the final hidden representations to produce the predictions. The ReFT framework considers both masked language models and autoregressive language models. Now, according to the linear representation hypothesis, in neural networks, concepts are encoded within the linear subspaces of representations. Recent models have found this claim to be true in neural network models trained on natural language along with other input distributions.

Furthermore, in interpretability studies, the casual abstraction framework uses interchange interventions to establish the role of neural network components casually when implementing particular behaviors. The logic behind interchange intervention is that if one fixes a representation to what it would have been for a counterfactual input, and this intervention affects the output of the model consistently in the way that the claims made by the ReFT framework about the component responsible for producing that representation, then the component plays a causal role in the behavior. Although there are a few methods, distributed interchange intervention is the ideal approach to test whether a concept is encoded in a linear subspace of a representation, as claimed by the linear representation hypothesis. Furthermore, the DAS method has been used previously to find linear representation in language models of entity attributes, sentiment, linguistic features, and mathematical reasoning. However, several experiments have indicated that the DAS method is highly expressive, and it possesses the ability to find causal efficacious subspaces even when the transformer language model has been initialized randomly, and therefore is yet to learn any task-specific representations, resulting in the debate whether DAS is effective and responsible enough for interpretability tasks.

The expressivity offered by DAS suggests that the approach could be an ideal tool to control the behavior of the language model along with its work on controllable generation and responsible editing. Therefore, to adapt language models for downstream tasks, the ReFT framework uses the distributed interchange intervention operation to make a new parameter efficient method. Furthermore, the ReFT method is a set of interventions, and the framework enforces that for any two interventions that operate on the same layer, the intervention positions must be disjoint, with the parameters of all intervention functions remaining independent. As a result, the ReFT is a generic framework that encompasses interventions on hidden representations during the model forward pass.

ReFT: Experiments and Results

To evaluate its performance against existing PEFT frameworks, the ReFT framework conducts experiments across four diverse natural language processing benchmarks, and covers over 20 datasets, with the primary goal being to provide a rich picture of how the LoReFT framework performs in different scenarios. Furthermore, when the LoReFT framework is implemented in real life, developers need to decide on how many interventions to learn along with the input positions and layers to apply each one on. To complete the task, the ReFT framework tunes four hyperparameters.

  1. The number of prefix positions to intervene on.
  2. The number of suffix positions to intervene on.
  3. What set of layers to intervene on.
  4. Whether or not to tie intervention parameters across different positions in the same layer.

By doing this, the ReFT framework simplifies the hyperparameter search space, and ensures only a fixed additional inference cost that does not scale with the length of the prompt.

The above table compares the accuracy of the LLaMA-7B and LLaMA-13B frameworks against existing PEFT models across 8 commonsense reasoning dataset. As it can be observed, the LoReFT model outperforms existing PEFT approaches by a decent margin, despite having much fewer parameters, with the average performance of three runs being reported with distinct parameter seeds for the LoReFT model. The param(%) is calculated by dividing the number of trainable parameters with the number of total parameters of the base large model.

The above table summarizes the accuracy comparison of the LLaMA-7B and LLaMA-13B frameworks against existing PEFT models across 4 different arithmetic reasoning datasets, with the framework reporting the average performance of three runs with distinct random seeds. As it can be observed, despite having much fewer params(%), the LoReFT framework outperforms existing PEFT frameworks by a considerable margin.

The above table summarizes the accuracy comparison of the RoBERTa-base and RoBERTa-large frameworks against existing PEFT models across the GLUE benchmark, with the framework reporting the average performance of five runs with distinct random seeds. As it can be observed, despite having much fewer params(%), the LoReFT framework outperforms existing PEFT frameworks by a considerable margin.

Final Thoughts

In this article, we have talked about LoReFT, a powerful alternative to existing PEFT frameworks that achieves strong performance across benchmarks from four different domains while offering up to 50 times the efficiency offered by previous state of the art PEFT models. Pre-trained large models are often fine tuned to be used for new domains or tasks, and during the fine-tuning process, a single base model can be adapted to a wide variety of tasks even with only small amounts of in-domain data available to the model. However, the process of fine-tuning an entire model is resource-consuming, and expensive, especially for language models with a significantly higher number of size and parameters. Parameter-efficient fine-tuning or PeFT methods propose to tackle the high costs associated with fine-tuning the whole model by updating only a small amount of the total weights available, a process that helps in reducing training time along with memory usage. Notably, LoReFT establishes new state-of-the-art performance on commonsense reasoning, instruction-following, and natural language understanding against the strongest PEFTs.

Save $165 on This Speech to Text Toolkit

TL;DR: Get a special price drop on Jott Pro AI Text & Speech Toolkit — now just $34.97 through April 21.

Artificial intelligence can help improve your productivity and save you time in a lot of ways. For instance, when you’re working with audio recordings or PDFs and you need to transcribe data or take notes, the Jott Pro AI Text & Speech Toolkit will come in handy. It’s currently on sale for a special limited-time price drop of just $34.97 for a lifetime license.

Features

Jott uses neural AI to extract text from images and PDF files or convert audio into written text quickly. Regardless of the format, Jott helps you save time on data entry, transcription and more tedious tasks with superior image-to-text, text-to-speech and speech-to-text conversion. It features built-in translation features that allow you to convert text into dozens of languages on the fly, including creating audio with voices by local speakers. That way, you can connect with more customers and potential clients all over the world.

Jott offers an intuitive user experience, making it easy to extract and edit text from any image format or work effortlessly with a range of file types — all while reducing the risk of human error in transcription. Plus, it’s AI so it’s always getting better. With a lifetime license, you’ll have access to all updates and improvements so you’re always working with the best tools possible.

Considering the time-saving and resource-saving potential this tool offers, the license could make a worthy investment in your career or business. Save time and scale your productivity with this powerful AI tool.

From April 15 through 11:59 pm PT on April 21, you can get a lifetime license to the Jott Pro AI Text & Speech Toolkit for more than 80% off $199 at just $34.97.

Get Jott Pro

Prices and availability subject to change.

Nothing Becomes the First Smartphone Company to Integrate OpenAI’s ChatGPT

Nothing has announced its plans to integrate ChatGPT into its smartphones and earbuds. The integration involves Nothing earbuds and Nothing OS being equipped with ChatGPT capabilities, allowing users to interact with the AI assistant effortlessly .

Guided by our mission to advance consumer tech products’ transition to AI, we've integrated Nothing earbuds and Nothing OS with ChatGPT to offer users instant access to knowledge directly from the devices they use most, earbuds and smartphones. pic.twitter.com/aUqrqxTChL

— Nothing (@nothing) April 18, 2024

This integration is exclusive to Nothing earbuds when connected to Nothing phones, with support for Phone (2) available now and planned updates for Phone (1) and Phone (2a) in the near future.

Nothing has also launched two new earbuds, Ear (a), and Ear.

Ear (a) comes in a tacky yellow color, giving it retro vibes. Interestingly, it is Nothing’s first audio product in a color outside of black and white.

Now let’s introduce Ear (a).
The first of its kind. Ear (a) yellow is Nothing’s first audio product in a colour outside of black and white.
"A fresh, colourful new addition to our audio family. With a head-turning design and our smartest noise cancelling powers – it's the… pic.twitter.com/nM5euTQGAv

— Nothing (@nothing) April 18, 2024

Users can now utilise a “pinch-to-speak” feature to access ChatGPT directly from their Ear (a) model or other compatible devices. This system-level integration via Nothing OS includes features like screenshot sharing and bespoke Nothing widgets, ensuring a smooth and intuitive user experience.

The new integration enables users to ask questions, listen to responses, and learn from ChatGPT on-the-go, marking a significant step in AI integration within consumer tech products.

The Ear (a) model boasts signature Nothing design, 45 dB active noise cancellation, a dynamic 11 mm driver for superior sound quality, LDAC support for high-quality audio streaming, and a Bass Enhance algorithm.

With up to 42.5 hours of listening time, users can enjoy uninterrupted audio experiences. The advanced noise cancellation technology ensures a seamless transition from quiet environments to noisy surroundings, enhancing the overall listening experience.

Moreover, Ear (a) features smart active noise cancellation that automatically adjusts based on noise leakage between the earbud and the ear canal, ensuring optimal noise cancellation performance every time they are worn.

The integration with ChatGPT includes voice control capabilities, allowing users to access ChatGPT using voice commands, with support coming soon to other Nothing earbud models.

Additionally, the integration introduces bespoke Nothing widgets, providing users with faster access to ChatGPT and enabling text, voice, or image searches directly from the home screen of compatible devices.

The Ear will be available for purchase at Rs 11,999, while the more affordable Ear (a) is priced at Rs 7,999. Both products will debut in Indian stores by late April, initially exclusively through Flipkart with launch discounts.

The post Nothing Becomes the First Smartphone Company to Integrate OpenAI’s ChatGPT appeared first on Analytics India Magazine.

Meta Llama 3 Now Available on Microsoft Azure

Meta has released a family of its latest LLMs, Llama 3. These models range from 8 billion to 70 billion parameters and include pre-trained as well as fine-tuned versions optimised specifically for dialogue applications and are available on Microsoft Azure.

BREAKING!!
LLaMA-3 On Azure:https://t.co/r2epz2aBxY pic.twitter.com/R8vp7E8yXa

— Yam Peleg (@Yampeleg) April 18, 2024

The fine-tuned Llama 3 models, designed for dialogue use cases, have demonstrated impressive performance across various benchmarks. In human evaluations assessing their helpfulness and safety, these models have proven to be on par with popular closed-source counterparts, according to Microsoft’s blog.

Meta is offering the Llama-3-8B inference APIs alongside hosted fine-tuning capabilities through Azure AI Studio. Azure AI Studio is a robust platform designed for developing Generative AI applications, offering features like a playground for model exploration, Prompt Flow for prompt engineering, and RAG (Retrieval Augmented Generation) for seamless integration of data into applications.

Under this offering, users can leverage the Llama-3-8B inference APIs on a pay-as-you-go basis, where billing is based on the input and output tokens utilised during model scoring.

Additionally, for models supporting fine-tuning, fine-tuning jobs are billed hourly, with inference for fine-tuned models incurring charges based on token usage along with an hourly hosting fee.

Integration with Azure AI Studio simplifies the subscription process for accessing and utilizing Meta’s Llama 3 models, offering a comprehensive environment for AI development and deployment.

Earlier this year, Meta chief Mark Zuckerberg announced that Meta trained Llama 3 using a massive compute infrastructure. The company plans to procure 350k H100s by the end of this year, with an overall total of almost 600k H100s equivalent of compute if other resources are included.

The post Meta Llama 3 Now Available on Microsoft Azure appeared first on Analytics India Magazine.