Open Source Community’s Swift Response Thwarts Massive Linux Backdoor Attack

It was an ordinary Wednesday for Andres Freund, a Microsoft software engineer, as he ran routine tests on his Linux machine. But as he investigated a curious slowdown in the “xz” compression library, a little-known piece of Linux software, he stumbled upon something far more nefarious. This sophisticated backdoor could have given attackers unfettered access to millions of computers worldwide.

“The upstream xz repository and the xz tarballs have been back doored,” Freund posted on the OSS-Security mailing list on March 29th, 2024. The news ricocheted through the Linux world as the implications came into focus.

It soon became clear that the attack was a deliberate and sophisticated subversion of a critical piece of the open-source ecosystem. The malicious code, cunningly disguised, granted attackers a foothold on countless Linux systems worldwide.

“This could have been the most widespread and effective backdoor ever planted in any software product,” warned Alex Stamos, the chief trust officer at SentinelOne, on the severity of the breach.

As the Linux community scrambled to contain the damage, a clearer picture of the attack emerged. Under the alias “Jia Tan,” the person had spent months patiently worming their way into the xz project, first with innocuous contributions, then with subtle malicious changes.

Lasse Collin, the project’s original owner for over a decade, was bullied into giving it up to ‘Jia Tan.’ When he could not keep up with the work, he was pressured by a couple of phantom online accounts to relinquish control of the project. Today, Collin, the original XZ maintainer, has taken back control of the project and is cleaning the code.

The Power of Open Source Community

As Freund alerted the community, security experts and system administrators worked to understand the scope of the problem and develop a fix. One of the first steps was to identify which versions of xz were compromised.

The community quickly determined that versions 5.6.0 and 5.6.1 contained the malicious code. Linux distributions that had included these versions in their testing or unstable branches swiftly replaced them with safe versions.

When the backdoor was discovered, Linux distributions immediately acted to protect their users. Debian, for example, replaced the compromised version of xz with an earlier, safe version. They kept the new version number to avoid breaking any dependencies but added a note to clarify that it was actually the older, secure version.

Meanwhile, security experts dug into the malicious code to determine exactly how it worked. They found that the attackers had used a clever trick to hijack certain functions in the xz library, allowing them to run their own code and gain control over affected systems.

Andres Freund, the Microsoft engineer who first discovered the backdoor, described it as “a very mysterious attack.” He noted that “the attackers clearly spent a lot of effort trying to hide what they were doing.”

Despite the attackers’ attempts to cover their tracks, the open-source community was able to dissect the malware and share their findings. By working together and leveraging their collective expertise, the community was able to develop and distribute fixes for the vulnerability quickly.

Within hours, a group of volunteer developers, security experts, and system administrators had mobilised to analyse the malware, patch the vulnerability, and share vital information. One person wrote on HackerNews, “We had to race last night to fix the problem after an inadvertent break of the embargo.”

The Godot Foundation of the open-source Godot Engine posted on X, “As an open-source project ourselves, we try our best to guard the product and our contributors against malicious actors. We consider ourselves really fortunate to have co-maintainers in many areas, even whole teams, to be able to scrutinise PRs closely.”

“Open source (community) caught it and reacted quickly. Like, good job random people on the Internet, open source worked,” Darren Shepherd, Chief Architect & Co-Founder of Acorn Labs, posted on X. In the face of the unprecedented attack on one of the widely used compression libraries, the open source community’s swift response and collaborative problem solving is what open source is all about: working together to fix problems.

The post Open Source Community’s Swift Response Thwarts Massive Linux Backdoor Attack appeared first on Analytics India Magazine.

5 Free Resources to Master Your Data Science Job Search

5 Free Resources to Master Your Data Science Job Search
Image by Author

Being on the job hunt is tough, there’s no two ways about it. Sending out resumes, re-writing cover letters, the interminable wait between applying and waiting to hear back (or just getting ghosted) – it’s not fun.

The good news is it’s a lot easier than it used to be. You don’t have to physically mail or drop off letters anymore; you can do a lot of applications with a few button clicks. There are plenty of specialized job boards, interview prep tools, and additional resources to make it more likely to find, apply, and actually get your dream data science job.

Let’s talk about the best free resources at your fingertips to get that data science job.

Kaggle – Real World Skills

It doesn’t matter how shiny your resume is if you don’t have the skills to back up your credentials. One of the best ways to get data science skills is by doing your own projects.

It’s sometimes tricky to get ideas for data science projects, which is where Kaggle comes in. Kaggle hosts a huge log of datasets, machine learning competitions, and includes answers and different approaches for how to tackle various projects.

5 Free Resources to Master Your Data Science Job Search
Source: https://www.kaggle.com/datasets

It's an excellent resource because it allows you to apply your data science skills in practical scenarios, receive feedback, and learn from the solutions of others. Not only that, but if you actually win a Kaggle competition, that can serve as a bit of a flex to any employers. Most data scientists know of Kaggle and will be suitably impressed that you can tackle those problems.

In short, the most valuable asset Kaggle provides is real-world data and real-world problems. It offers valuable exposure to industry-level problems – and the opportunity to be noticed by top companies.

StrataScratch – Interview Prep

I may be slightly biased here as the founder of StrataScratch, but I founded the company because I noticed a real problem: it’s hard to prep for data science interviews. So I started collecting interview questions from as many different companies as I could and categorizing them by difficulty, type of question, and company. The result is a database of over a thousand real-life interview questions – both coding and non-coding – plus the solutions if you’re really stumped.

In my experience interviewing for data science jobs, it’s not just about having the skills, it’s also about being able to stay calm and think through whatever they throw your way. As you might imagine, it’s a lot easier to do that if you’ve seen the interview question – or some variation of it – before.

It’s a good idea to practice interview questions at every stage in your data science job hunt, too, not just when you have an interview lined up. Practicing IRL interview questions gives you a sense of what problems data science companies are interested in solving, as well as the skills you should focus on learning or honing.

edX and Coursera (In Audit Mode) – Gain Knowledge

Fun fact: while edX and Coursera have very expensive data science courses, you can get all the same knowledge absolutely for free simply by auditing the courses. Now, this means you don’t get a certificate of your accomplishments, which can definitely be valuable, but you do get world-class lessons, tutorials, and guides for free.

5 Free Resources to Master Your Data Science Job Search
Source: https://www.edx.org/verified-certificate

Just find the course with the information you’re interested in, and sign up under audit mode. You can use this to shore up weak points on your resume, learn skills to do projects for your portfolio, or just explore a topic you’re passionate about.

KDNuggets and Towards Data Science – Read Data Science Blogs

You’re reading this on KDNuggets, so you should already know it’s a useful resource to get a data science job. KDNuggets doesn’t just offer blog posts, though. There are datasets (again, useful for projects), live and virtual events (great for networking), programming cheat sheets, and curated tool recommendations.

I’m throwing in Towards Data Science, too, since it’s another blog packed with tutorials, guides, how-tos, personal stories and experiences, and more. While some stories are paywalled, many are left free. You can easily browse the TDS homepage and look for free stories that don’t have a little star next to the author's name.

In short, one of the best ways to get a data science job is to learn from other data scientists. Many of them are kind enough to post content online for free for you to read and enjoy.

Wellfound – Job Board

Not sure where to start your data science job search? Classic contenders like LinkedIn and Indeed definitely win in terms of volume, but I love Wellfound to find data science jobs for the curated aspect.

Wellfound has a few advantages over other job boards. One, the filtering options are powerful. You can easily find jobs based on investment round, salary, equity, markets, company size, and more.

Two, it’s primarily startups. If you’ve tried and failed to get a FAANG job, it might be time to turn your sights to a different scene. Startups are hungry for data science talent, and if you can broaden your horizons to consider a slightly less conventional employer, you might have better luck.

Three, it’s just a bit newer and fresher, so I find it to be a better job-hunting experience. Features include telling you who invested in the company, how recently the recruiter was reviewing applicants, and pulling in stats from Glassdoor about leadership and life/work balance ratings.

5 Free Resources to Master Your Data Science Job Search
Source: https://wellfound.com/ Final Thoughts

Job hunting is never fun, and it feels like this year has been worse in terms of companies ghosting more, making you sit through multiple rounds of interviews only to say the position was actually filled internally, or just straight up posting non-existent jobs to make themselves look better in front of potential investors. Perhaps you’ve even run into a scam job posting.

Hopefully, this list of free resources makes your life a little easier. With these five free tools, you’ll be better equipped to find and get your ideal data science job.

Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.

More On This Topic

  • Elevate Your Search Engine Skills with Uplimit's Search with ML Course!
  • Hyperparameter Tuning Using Grid Search and Random Search in Python
  • Building a Visual Search Engine — Part 2: The Search Engine
  • 9 Free Resources to Master Python
  • A Comprehensive List of Resources to Master Large Language Models
  • Free Data Science Interview Book to Land Your Dream Job

Can AI Learn from Newborn Babies?

Can AI learn from newborn babies_

In an interesting discovery, data scientists at New York University have decoded the ability of AI models to glean insights from the cute babble of infants. While humans have long been recognised for their specialised traits for language acquisition, this study proves the notion that AI, too, can harness the essence of learning from minimal data sets.

“We ran exactly this experiment. We trained a neural net (which we call CVCL, related to CLIP by its use of a contrastive objective) on headcam video, which captured slices of what a child saw and heard from 6 to 25 months.

It’s an unprecedented look at one child’s experience, but still, the data is limited: just 61 hours (transcribed) or about 1% of their waking hours,” said Wai Keen Vong, one of the researchers of ‘Grounded language acquisition through the eyes and ears of a single child’ study.

This is similar to the Meta AI Chief Yann LeCun’s version of autonomous machine intelligence. The Turing Prize winner has long argued that teaching AI systems to observe like children might be the way forward to more intelligent systems. He has predicted that with his ‘world model’, which is similar to how the human brain works, might be the ideal way forward for AI systems to become intelligent.

Learning from kid’s experiences

Despite working with limited data, the research study has shown that the AI model can effectively learn word-referent associations with tens to hundreds of examples. It can generalise seamlessly to new visual datasets and demonstrates the ability to achieve multi-modal alignment.

“Our findings address a classic long-standing debate in philosophy and cognitive science: What ingredients do children need to learn words? Given their everyday experience, do they (or any learner) need language-specific inductive biases or innate knowledge to get going? Or can joint representation and associative learning suffice? Our work shows that we can get more with just learning than commonly thought,” Vong added.

Despite its advancements, the current model, Child’s View for Contrastive Learning (CVCL), falls short compared to a typical 2-year-old’s vocabulary and word-learning abilities.

Several factors contribute to this gap, including CVCL’s lack of sensory experiences such as taste, touch, and smell, its passive learning approach compared to a child’s active engagement, and its absence of social cognition.

Unlike children, CVCL doesn’t perceive desires, goals, or social cues, nor does it grasp that language serves as a means of fulfilling wants.

Child’s Play, A Way Forward to More Intelligent Systems

Observing children has proven invaluable in advancing AI’s understanding of the physical world. Researchers at Google DeepMind confirmed that developmental psychologists had identified key physical concepts by studying infants’ innate knowledge of physics. They devised methods like the violation-of-expectation paradigm to measure them.

Inspired by developmental psychology, the team created PLATO (Physics Learning through Auto-encoding and Tracking Objects). This model represents the world as evolving objects and makes predictions based on their interactions.

Training PLATO on simple physical interactions, it was found that it surpassed other models lacking object-based representations, indicating the importance of this framework in intuitive physics learning.

PLATO demonstrated the ability to learn with as little as 28 hours of visual experience and could generalise to new stimuli without re-training. This work highlights the potential of child development research to inform the development of AI systems capable of understanding and navigating the complexities of the physical world.

AI Can Help a Child, Too!

In another groundbreaking innovation, researchers at the University of California, Los Angeles developed a new AI application, Chatterbaby, to interpret babies’ cries and provide insights into what they are trying to communicate.

Dr. Ariana Anderson and her team uploaded 2,000 audio samples of infant cries, which were able to predict why babies are crying with an accuracy of 90%. They then used AI algorithms to distinguish between cries induced by hunger, pain, and irritation.

The post Can AI Learn from Newborn Babies? appeared first on Analytics India Magazine.

POKELLMON: A Human-Parity Agent for Pokemon Battles with LLMs

POKELLMON: A Human-Parity Agent for Pokemon Battles with LLMs

Large Language Models and Generative AI have demonstrated unprecedented success on a wide array of Natural Language Processing tasks. After conquering the NLP field, the next challenge for GenAI and LLM researchers is to explore how large language models can act autonomously in the real world with an extended generation gap from text to action, thus representing a significant paradigm in the pursuit of Artificial General Intelligence. Online games are considered to be a suitable test foundation to develop large language model embodied agents that interact with the visual environment in a way that a human would do.

For example, in a popular online simulation game Minecraft, decision making agents can be employed to assist the players in exploring the world along with developing skills for making tools and solving tasks. Another example of LLM agents interacting with the visual environment can be experienced in another online game, The Sims where agents have demonstrated remarkable success in social interactions and exhibit behavior that resembles humans. However, compared to existing games, tactical battle games might prove to be a better choice to benchmark the ability of large language models to play virtual games. The primary reason why tactical games make a better benchmark is because the win rate can be measured directly, and consistent opponents including human players and AI are always available.

Building on the same, POKELLMON, aims to be the world’s first embodied agent that achieves human-level performance on tactical games, similar to the one witnessed in Pokemon battles. At its core, the POKELLMON framework incorporates three main strategies.

  1. In-context reinforcement learning that consumes text-based feedback derived from battles instantaneously to refine the policy iteratively.
  2. Knowledge-augmented generation that retrieves external knowledge to counter hallucinations, enabling the agent to act properly and when it's needed.
  3. Consistent action generation to minimize the panic switching situation when the agent comes across a strong player, and wants to avoid facing them.

This article aims to cover the POKELLMON framework in depth, and we explore the mechanism, the methodology, the architecture of the framework along with its comparison with state of the art frameworks. We will also talk about how the POKELLMON framework demonstrates remarkable human-like battle strategies, and in-time decision making abilities, achieving a respectable win rate of almost 50%. So let’s get started.

POKELLMON: A Human Parity Agent with LLM for Pokemon Battles

The growth in the capabilities, and efficiency of Large Language Models, and Generative AI frameworks in the past few years has been nothing but marvelous, especially on NLP tasks. Recently, developers and AI researchers have been working on ways to make Generative AI and LLMs more prominent in real-world scenarios with the ability to act autonomously in the physical world. To achieve this autonomous performance in physical and real world situations, researchers and developers consider games to be a suitable test bed to develop LLM-embodied agents with the ability to interact with the virtual environment in a manner that resembles human behavior.

Previously, developers have tried to develop LLM-embodied agents on virtual simulation games like Minecraft and Sims, although it is believed that tactical games like Pokemon might be a better choice to develop these agents. Pokemon battles enables the developers to evaluate a trainer’s ability to battle in well-known Pokemon games, and offers several advantages over other tactical games. Since the action and state spaces are discrete, it can be translated into text without any loss. The following figure illustrates a typical Pokemon battle where the player is asked to generate an action to perform at each turn given the current state of the Pokemon from each side. The users have the option to choose from five different Pokemons and there are a total of four moves in the action space. Furthermore, the game helps in alleviating the stress on the inference time and inference costs for LLMs since the turn-based format eliminates the requirement for an intensive gameplay. As a result, the performance is dependent primarily on the reasoning ability of the large language model. Finally, although the Pokemon battle games appear to be simple, things are a bit more complex in reality and highly strategic. An experienced player does not randomly select a Pokemon for the battle, but takes various factors into consideration including type, stats, abilities, species, items, moves of the Pokemons, both on and off the battlefield. Furthermore, in a random battle, the Pokemons are selected randomly from a pool of over a thousand characters, each with their own set of distinct characters with reasoning ability and Pokemon knowledge.

POKELLMON : Methodology and Architecture

The overall framework and architecture of the POKELLMON framework is illustrated in the following image.

During each turn, the POKELLMON framework uses previous actions, and its corresponding text-based feedback to refine the policy iteratively along with augmenting the current state information with external knowledge like ability/move effects or advantage/weakness relationship. For information given as input, the POKELLMON framework generates multiple actions independently, and then selects the most consistent ones as the final output.

In-Context Reinforcement Learning

Human players and athletes often make decisions not only on the basis of the current state, but they also reflect on the feedback from previous actions as well the experiences of other players. It would be safe to say that positive feedback is what helps a player learn from their mistakes, and refrains them from making the same mistake over and over again. Without proper feedback, the POKELLMON agents might stick to the same error action, as demonstrated in the following figure.

As it can be observed, the in-game agent uses a water-based move against a Pokemon character that has the “Dry Skin” ability, allowing it to nullify the damage against water-based attacks. The game tries to alert the user by flashing the message “Immune” on the screen that might prompt a human player to reconsider their actions, and change them, even without knowing about “Dry Skin”. However, it is not included in the state description for the agent, resulting in the agent making the same mistake again.

To ensure that the POKELLMON agent learns from its prior mistakes, the framework implements the In-Context Reinforcement Learning approach. Reinforcement learning is a popular approach in machine learning, and it helps developers with the refining policy since it requires numeric rewards to evaluate actions. Since large language models have the ability to interpret and understand language, text-based descriptions have emerged as a new form of reward for the LLMs. By including text-based feedback from the previous actions, the POKELLMON agent is able to iteratively and instantly refine its policy, namely the In-Context Reinforcement Learning. The POKELLMON framework develops four types of feedback,

  1. The actual damage caused by an attack move on the basis of the difference in HP over two consecutive turns.
  2. The effectiveness of attack moves. The feedback indicates the effectiveness of the attack in terms of having no effect or immune, ineffective, or super-effective due to ability/move effects, or type advantage.
  3. The priority order for executing a move. Since the precise stats for the opposing Pokemon character is not available, the priority order feedback provides a rough estimate of speed.
  4. The actual effect of the moves executed on the opponent. Both attack moves, and status might result in outcomes like recover HP, stat boost or debuffs, inflict conditions like freezing, burns or poison.

Furthermore, the use of the In-Context Reinforcement Learning approach results in significant boost in performance as demonstrated in the following figure.

When put against the original performance on GPT-4, the win rate shoots up by nearly 10% along with nearly 13% boost in the battle score. Furthermore, as demonstrated in the following figure, the agent begins to analyze and change its action if the moves executed in the previous moves were not able to match the expectations.

Knowledge-Augmented Generation or KAG

Although implementing In-Context Reinforcement Learning does help with hallucinations to an extent, it can still result in fatal consequences before the agent receives the feedback. For example, if the agent decides to battle against a fire-type Pokemon with a grass-type Pokemon, the former is likely to win in probably a single turn. To reduce hallucinations further, and improve the decision making ability of the agent, the POKELLMON framework implements the Knowledge-Augmented Generation or the KAG approach, a technique that employs external knowledge to augment generation.

Now, when the model generates the 4 types of feedback discussed above, it annotates the Pokemon moves and information allowing the agent to infer the type advantage relationship on its own. In an attempt to reduce the hallucination contained in reasoning further, the POKELLMON framework explicitly annotates the type advantage, and weakness of the opposing Pokemon, and the agent’s Pokemon with adequate descriptions. Furthermore, it is challenging to memorize the moves and abilities with distinct effects of Pokemons especially since there are a lot of them. The following table demonstrates the results of knowledge augmented generation. It is worth noting that by implementing the Knowledge Augmented Generation approach, the POKELLMON framework is able to increase the win rate by about 20% from existing 36% to 55%.

Furthermore, developers observed that when the agent was provided with external knowledge of Pokemons, it started to use special moves at the right time, as demonstrated in the following image.

Consistent Action Generation

Existing models demonstrate that implementing prompting and reasoning approaches can enhance the LLMs ability on solving complex tasks. Instead of generating a one-shot action, the POKELLMON framework evaluates existing prompting strategies including CoT or Chain of Thought, ToT or Tree of Thought, and Self Consistency. For Chain of Thought, the agent initially generates a thought that analyzes the current battle scenario, and outputs an action conditioned on the thought. For Self Consistency, the agent generates three times the actions, and selects the output that has received the maximum number of votes. Finally, for the Tree of Thought approach, the framework generates three actions just like in the self consistency approach, but picks the one it considers the best after evaluating them all by itself. The following table summarizes the performance of the prompting approaches.

There is only a single action for each turn, which implies that even if the agent decides to switch, and the opponent decides to attack, the switch-in Pokémon would take the damage. Normally the agent decides to switch because it wants to type-advantage switch an off-the-battle Pokémon, and thus the switching-in Pokémon can sustain the damage, since it was type-resistant to the opposing Pokémon’s moves . However, as above, for the agent with CoT reasoning, even if the powerful opposing Pokémon forces various rotates, it acts inconsistently with the mission, because it might not want to switch-in to the Pokemon but several Pokémon and back, which we term panic switching. Panic switching eliminates the chances to take moves, and thus defeats.

POKELLMON : Results and Experiments

Before we discuss the results, it is essential for us to understand the battle environment. At the beginning of a turn, the environment receives an action-request message from the server and will respond to this message at the end, which also contains the execution result from the last turn.

  1. First parses the message and updates local state variables, 2. then translates the state variables into text. The text description has mainly four parts: 1. Own team information, which contains the attributes of Pokémon in-the-field and off-the-field (unused).
  2. Opponent team information, which contains the attributes of opponent Pokémon in-the-field and off-the-field (some information is unknown).
  3. Battlefield information, which includes the weather, entry hazards, and terrain.
  4. Historical turn log information, which contains previous actions of both Pokémon and is stored in a log queue. LLMs take the translated state as input and output actions for the next step. The action is then sent to the server and executed at the same time as the action done by the human.

Battle Against Human Players

The following table illustrates the performance of the POKELLMON agent against human players.

As it can be observed, the POKELLMON agent delivers performance comparable to ladder players who have a higher win rate when compared to an invited player along with having extensive battle experience.

Battle Skill Analysis

The POKELLMON framework rarely makes a mistake at choosing the effective move, and switches to another suitable Pokemon owing to the Knowledge Augmented Generation strategy.

As shown in the above example, the agent uses only one Pokemon to defeat the entire opponent team since it is able to choose different attack moves, the ones that are most effective for the opponent in that situation. Furthermore, the POKELLMON framework also exhibits human-like attrition strategy. Some Pokemons have a “Toxic” move that can inflict additional damage at each turn, while the “Recover” move allows it to recover its HP. Taking advantage of the same, the agent first poisons the opposing Pokemon, and uses the Recover move to prevent itself from fainting.

Final Thoughts

In this article, we have talked about POKELLMON, an approach that enables large language models to play Pokemon battles against humans autonomously. POKELLMON, aims to be the world’s first embodied agent that achieves human-level performance on tactical games, similar to the one witnessed in Pokemon battles. The POKELLMON framework introduces three key strategies: In-Context Reinforcement Learning which consumes the text-based feedback as “reward” to iteratively refine the action generation policy without training, Knowledge-Augmented Generation that retrieves external knowledge to combat hallucination and ensures the agent act timely and properly, and Consistent Action Generation that prevents the panic switching issue when encountering powerful opponents.

DeepLearning Comes Up with New Course on Unstructured Data Handling for LLMs

Andrew Ng has rolled out a new course called “Preprocessing Unstructured Data for LLM Applications,” this time in collaboration with San Francisco-based startup Unstructured. Unstructured essentially captures unstructured data wherever it is stored and transforms it into AI-friendly JSON files for companies eager to incorporate AI into their business.

Taught by Matt Robinson, head of product at Unstructured, it’s free for a limited time and takes about an hour to complete.

You’ll learn to extract and standardise content from various document types, such as PDFs, PowerPoints, Word, and HTML files, as well as tables and images into a common JSON format. This will broaden the range of information available for your LLM applications. Enriching your content with metadata will improve retrieval augmented generation (RAG) results and enable more nuanced search capabilities.

The course covers techniques for document image analysis, including layout detection and vision and table transformers. You’ll discover how to apply these methods to preprocess PDFs, images, and tables. It is suitable for anyone interested in effectively processing diverse data types and formats to build high-performing LLM RAG systems.

The post DeepLearning Comes Up with New Course on Unstructured Data Handling for LLMs appeared first on Analytics India Magazine.

Meet Ferret-UI, Apple’s AI-Powered Answer to Mobile UI Challenges

Apple

Ahead of Apple’s flagship event, WWDC 2024, in June, the tech giant is going all in to bringing generative AI to its products. Enter Ferret-UI, a specialised LLM tailored specifically for the nuanced demands of mobile user interface comprehension and interaction.

In this paper called “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs”, the authors present Ferret-UI as a solution to the limitations of existing LLMs in handling UI screens better.

While general-purpose LLMs like GPT-3 have garnered attention for their versatility, they often struggle to understand and effectively interact with UI screens, especially in the mobile domain. The core focus of Ferret-UI lies in its multimodal capabilities, combining advanced language understanding with visual comprehension tailored specifically for mobile UI screens, incorporating referring, grounding, and reasoning capabilities.

Under the Hood

One of the key challenges in adapting LLMs to UI screens is the unique characteristics of these screens compared to natural images. UI screens often have elongated aspect ratios and contain smaller objects of interest, such as icons and texts, which are not typically encountered in natural images. To address this challenge, Ferret-UI integrates a mechanism called “any resolution,” allowing it to handle screens of varying aspect ratios and magnifying details for enhanced visual feature extraction. By encoding each sub-image separately before feeding them to the LLM, Ferret-UI ensures that no critical visual information is lost during processing.

Moreover, Ferret-UI employs a new approach to data curation, gathering training samples from a wide range of elementary UI tasks. These tasks include icon recognition, finding text, and widget listing, among others. By training on such diverse tasks, Ferret-UI learns to understand UI elements’ semantics and spatial positioning, enabling it to make distinctions at both broad and detailed levels.

In addition to elementary tasks, Ferret-UI is also trained on specialised tasks, such as detailed description generation, perception-conversation understanding, and function inference. These tasks prepare the model to engage in intricate discussions about visual components, formulate action plans based on specific goals, and interpret the overall purpose of a UI screen.

To evaluate the effectiveness of Ferret-UI, the authors establish a comprehensive benchmark encompassing various UI tasks. Comparative evaluations with other existing models, including open-source LLMs and GPT-4V, demonstrate Ferret-UI’s superiority, particularly in elementary UI tasks and advanced reasoning capabilities.

If Apple integrates Ferret UI in Siri, it can be a game-changing experience for Apple users. Integrating Ferret-UI into Siri can also improve accessibility features, enable seamless app integration, offer personalised assistance, facilitate natural language UI navigation, and enhance integration with voice assistive technologies, benefiting users with special needs and improving overall user experience on iOS devices.

This update comes soon after Apple released the MM1 model last month and ReALM (Reference Resolution As Language Modeling) two weeks ago. The company has also forged a $50M licensing deal with Shutterstock to acquire AI training data.

The post Meet Ferret-UI, Apple’s AI-Powered Answer to Mobile UI Challenges appeared first on Analytics India Magazine.

Google Demonstrates Method to Scale Language Model to Infinitely Long Inputs

Google Demonstrates Method to Scale Language Model to Infinitely Long Inputs

Google researchers have introduced a method for scaling Transformer-based large language models (LLMs) to handle infinitely long inputs with bounded memory and computation.

The paper titled Leave No Context Behind devises the approach, known as Infini-attention which incorporates compressive memory into the vanilla attention mechanism and combines masked local attention and long-term linear attention mechanisms in a single Transformer block.

This modification to the Transformer attention layer supports continual pre-training and fine-tuning, facilitating the natural extension of existing LLMs to process infinitely long contexts.

Infini-attention reuses key, value, and query states from standard attention computations for long-term memory consolidation and retrieval. Instead of discarding old key-value (KV) states, the approach stores them in compressive memory and retrieves values using attention query states for processing subsequent sequences. The final contextual output is computed by combining long-term memory-retrieved values with local attention contexts.

Experimental results demonstrate that this approach surpasses baseline models on long-context language modelling benchmarks, achieving a 114x comprehension ratio in terms of memory size.

The model improves perplexity when trained with 100K sequence length. A 1B LLM scales naturally to 1M sequence length, successfully completing the passkey retrieval task when equipped with Infini-attention.

The researchers demonstrated the approach’s effectiveness using long-context language modelling benchmarks, including 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. The method maintains minimal bounded memory parameters and allows for fast streaming inference for LLMs.

The contributions of this work include:

  • Infini-attention, which combines long-term compressive memory and local causal attention to efficiently model both long- and short-range contextual dependencies.
  • Minimal changes to the standard scaled dot-product attention, allowing for plug-and-play continual pre-training and long-context adaptation.
  • Enabling Transformer LLMs to process extremely long inputs in a streaming fashion with bounded memory and compute resources.

Infini-Transformer is compared with Transformer-XL, showing that Infini-Transformer operates on sequences of segments, computing standard causal dot-product attention context within each segment.

Unlike Transformer-XL, Infini-Transformers reuse the KV attention states of previous segments to maintain the entire context history with compressive memory, achieving efficient memory and computation usage.

The conclusion of the study emphasises the importance of an effective memory system for comprehending long contexts with LLMs, reasoning, planning, continual adaptation, and learning. The work integrates a compressive memory module into the vanilla dot-product attention layer, enabling LLMs to process infinitely long contexts with bounded memory and computation resources.

The approach scales naturally to handle million-length input sequences and outperforms baselines on long-context language modelling benchmarks and book summarization tasks. The 1B model, fine-tuned on up to 5K sequence length passkey instances, successfully solved the 1M length problem.

The post Google Demonstrates Method to Scale Language Model to Infinitely Long Inputs appeared first on Analytics India Magazine.

India is a Sweet Spot for Intel

India is a Sweet Spot for Intel

Indian companies are very much dedicated to building their own AI models. And Intel has been a long lover when it comes to delivering solutions within the country in every technological domain. Now, Intel is taking another step forward by partnering with Indian companies for delivering its AI hardware.

“Boy was I startled when I learned that India is very convinced they need their own models for their environment” said Patrick Gelsinger, CEO of Intel at Intel Vision 2024. “They are excited to train and be able to deliver that using Gaudi clusters,” he added.

Apart from Xeon 6 and Gaudi 3, there were various new collaborations announced at the event, many within India.

The buzzing partner ecosystem

Bharti Airtel aims to harness its extensive telecom data to enhance AI capabilities, thereby enriching customer experiences and exploring new revenue avenues in the digital realm.

Infosys has announced a strategic partnership with Intel, integrating Intel technologies such as 4th and 5th Gen Intel Xeon processors, Intel Gaudi 2 AI accelerators, and Intel Core Ultra into Infosys Topaz. This collaboration aims to offer AI-first services, solutions, and platforms to accelerate business value through generative AI technologies.

Infosys also plans to utilise Intel’s AI training resources to educate its employees about Intel’s product offerings, enabling them to offer generative AI expertise to the company’s extensive international customer base across various industries.

Ola Krutrim is utilising Intel Gaudi 2 clusters to pre-train and fine-tune its foundational models with generative capabilities in ten languages, achieving industry-leading price/performance ratios compared to existing market solutions. Additionally, Krutrim is currently pre-training a larger foundational model on an Intel Gaudi 2 cluster, further advancing its AI capabilities.

CtrlS, one of the largest and fastest growing data centres in the world, which is hosting most of the providers in India is also using Gaudi 2 and Xeon processors, revealed Gelsinger in his keynote.

In March, L&T had also announced its collaboration with Intel for deploying scalable edge-AI solutions across various domains, including Cellular Vehicle-to-Everything (CV2X) applications, leveraging the expertise in connected vehicles and smart transportation systems alongside Intel’s Edge Platform.

Just this month, Zoho also collaborated with Intel for optimising AI workloads within the company. Santosh Viswanathan, vice president and managing director at Intel, said that Zoho has witnessed significant performance improvements in AI workloads with 4th Gen Intel Xeon processors.

Though most of these partnerships come with the last generation of Gaudi and Xeon, the leadership has been quite vocal about the expansion plans within the country.

India is set as a distinct entity

Intel is betting big on India, which is not that new. Providing cheaper alternatives when it comes to data centres and powering enterprise solutions, Intel has always been the go to choice for Indian companies. While NVIDIA is increasingly expanding its partnership within India with entities like Yotta for establishing data centres, Intel remains a viable option for already established customers within India.

“AI does not just require big GPUs to solve the problem. There are a lot of different models that can run on Xeon. Innovation at scale can happen with Xeon. We are working with several large customers. Gaudi 2 is available, Gaudi 3 comes in the second half. You will see some of those products coming into India through these customers as well,” Viswanathan said earlier.

Christoph Schell, executive vice president of Intel said that the company is betting big on India when it comes to AI by carving it out as a separate geographic region. The American chip manufacturer is introducing a new era of computing with the release of its AI-powered PC in late 2023.

These systems, featuring Intel’s Core Ultra processors tailored for AI tasks, improve user productivity and experience. Intel’s AI PCs are currently available in the market, and numerous retailers have started distributing them in India. By 2025, Intel aims to supply core processors for as many as 100 million AI-enabled PCs, much of which would be through India.

Though Viswanthan has said that the company currently has no plans for setting up its fab within the country, it is still betting big on AI in India through other ways.

“World needs a balanced supply chain. You cannot have 80% of servers being made in one place and 90% of all laptops made in one place. I think that’s the key change where India can really step and help build a balanced electronics supply chain for the world,” he said.

Viswanthan said India has about 20% of the world’s data sets that can be used for AI models training.

“We are very frugal. 16 or 20% of the world’s AI talent is in India. We kind of lead the world and not follow in this path. That’s another piece that makes me bullish about India. For me India is most exciting. AI is not just artificial intelligence, it is also amazing in India. No other country has digital infrastructure at the scale that we have. India stack is a game changer,” Viswanathan said.

The post India is a Sweet Spot for Intel appeared first on Analytics India Magazine.

Meta Unveils Details of its Latest AI Chip, MTIA

Meta Unveils Details of its Latest AI Chip, MTIA

In a race amongst its competitors also building custom chips for AI models, Meta’s AI chip, the Meta Training and Inference Accelerator (MTIA) is poised to revolutionise the training of ranking and recommendation models,

Key highlights of the new MTIA chip include a significant boost in on-chip memory capacity, with 256MB compared to its predecessor’s 128MB, and a higher clock speed of 1.3GHz compared to 800GHz. Early tests conducted by Meta have shown a remarkable threefold performance improvement across multiple models evaluated.

Dubbed internally as “Artemis,” the MTIA v2 project underscores Meta’s commitment to advancing AI capabilities, extending beyond inference to encompass training tasks. This move aligns with a broader trend in the industry, with major players like Google, Microsoft, and Amazon investing in custom AI chips to meet the escalating demand for compute power.

This week, Google announced the general availability of its fifth-generation custom chip, TPU v5p, for training AI models to Google Cloud users. Additionally, Google introduced its inaugural chip designed specifically for model execution, named Axion. Amazon has developed multiple families of custom AI chips, while Microsoft entered the field last year with the Azure Maia AI Accelerator and the Azure Cobalt 100 CPU.

Originally slated for release in 2025, Meta surprised the industry by announcing that both MTIA versions, including the upcoming iteration, are already in production. While MTIA currently focuses on training ranking and recommendation algorithms, Meta aims to broaden its scope to include training generative AI models like its Llama language models in the future.

The forthcoming MTIA chip is part of Meta’s broader full-stack development program for custom silicon, catering specifically to its distinctive workloads and systems. Notable improvements in the next-gen MTIA include more than doubling the compute and memory bandwidth compared to its predecessor, while maintaining close alignment with Meta’s workload requirements, particularly for ranking and recommendation models.

At the heart of the new MTIA design lies a focused architecture aimed at striking the right balance between compute, memory bandwidth, and capacity, crucial for serving ranking and recommendation models efficiently. The chip boasts an 8×8 grid of processing elements (PEs), delivering significantly enhanced dense and sparse compute performance compared to MTIA v1.

Furthermore, the new MTIA iteration features an upgraded network on chip (NoC) architecture, doubling the bandwidth and facilitating low-latency coordination between different PEs, essential for scaling MTIA to a wider range of challenging workloads.

In terms of hardware, Meta has developed a large rack-based system capable of accommodating up to 72 accelerators, each housing two chips. The system, meticulously designed to support the next-generation silicon, enables higher compute, memory bandwidth, and capacity, thereby accommodating a broad spectrum of model complexities and sizes.

Software integration has been a key focus for Meta, with the MTIA stack seamlessly integrating with PyTorch 2.0, leveraging features like TorchDynamo and TorchInductor. The Triton-MTIA compiler backend further optimises the software stack, enhancing developer productivity and expanding support for PyTorch operators.

Performance results indicate a significant leap in efficiency, with early tests showcasing a 3x improvement over the first-generation chip across key models. Meta’s ongoing investment in custom silicon underscores its commitment to building the most powerful and efficient infrastructure for its AI workloads, with MTIA set to play a pivotal role in this long-term roadmap.

The post Meta Unveils Details of its Latest AI Chip, MTIA appeared first on Analytics India Magazine.

Meta unveils its newest custom AI chip as it races to catch up

Meta unveils its newest custom AI chip as it races to catch up Kyle Wiggers 11 hours

Meta, hell-bent on catching up to rivals in the generative AI space, is spending billions on its own AI efforts. A portion of those billions is going toward recruiting AI researchers. But an even larger chunk is being spent developing hardware, specifically chips to run and train Meta’s AI models.

Meta unveiled the newest fruit of its chip dev efforts today, conspicuously a day after Intel announced its latest AI accelerator hardware. Called the “next-gen” Meta Training and Inference Accelerator (MTIA), the successor to last year’s MTIA v1, the chip runs models including for ranking and recommending display ads on Meta’s properties (e.g. Facebook).

Compared to MTIA v1, which was built on a 7nm process, the next-gen MTIA is 5nm. (In chip manufacturing, “process” refers to the size of the smallest component that can be built on the chip.) The next-gen MTIA is a physically larger design, packed with more processing cores than its predecessor. And while it consumes more power — 90W versus 25W — it also boasts more internal memory (128MB versus 64MB) and runs at a higher average clock speed (1.35GHz up from 800MHz).

Meta says the next-gen MTIA is currently live in 16 of its data center regions and delivering up to 3x overall better performance compared to MTIA v1. If that “3x” claim sounds a bit vague, you’re not wrong — we thought so too. But Meta would only volunteer that the figure came from testing the performance of “four key models” across both chips.

“Because we control the whole stack, we can achieve greater efficiency compared to commercially available GPUs,” Meta writes in a blog post shared with TechCrunch.

Meta’s hardware showcase — which comes a mere 24 hours after a press briefing on the company’s various ongoing generative AI initiatives — is unusual for several reasons.

One, Meta reveals in the blog post that it’s not using the next-gen MTIA for generative AI training workloads at the moment, although the company claims it has “several programs underway” exploring this. Two, Meta admits that the next-gen MTIA won’t replace GPUs for running or training models — but instead will complement them.

Reading between the lines, Meta is moving slowly — perhaps more slowly than it’d like.

Meta’s AI teams are almost certainly under pressure to cut costs. The company’s set to spend an estimated $18 billion by the end of 2024 on GPUs for training and running generative AI models, and — with training costs for cutting-edge generative models ranging in the tens of millions of dollars — in-house hardware presents an attractive alternative.

And while Meta’s hardware drags, rivals are pulling ahead, much to the consternation of Meta’s leadership, I’d suspect.

Google this week made its fifth-generation custom chip for training AI models, TPU v5p, generally available to Google Cloud customers, and revealed its first dedicated chip for running models, Axion. Amazon has several custom AI chip families under its belt. And Microsoft last year jumped into the fray with the Azure Maia AI Accelerator and the Azure Cobalt 100 CPU.

In the blog post, Meta says it took fewer than nine months to “go from first silicon to production models” of the next-gen MTIA, which to be fair is shorter than the typical window between Google TPUs. But Meta has a lot of catching up to do if it hopes to achieve a measure of independence from third-party GPUs — and match its stiff competition.