OpenAI inks deal to train AI on Reddit data

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said that the Reddit partnership will provide it access to “real-time, structured and unique content” — e.g. posts and replies — from Reddit, allowing its tools […]

© 2024 TechCrunch. All rights reserved. For personal use only.

U.K.’s AI Safety Institute Launches Open-Source Testing Platform

The U.K.’s AI Safety Institute has released a free, open-source testing platform that evaluates the safety of new AI models. Dubbed Inspect, the toolset should provide a “consistent approach” towards the creation of secure AI applications around the world.

Inspect is the first AI safety testing platform created by a state-backed body to be made freely available to the public. Ultimately, it will accelerate improvements in the development of secure AI models and safety testing efficacy.

How to use the Inspect software library

The Inspect software library, launched on May 10, can be used to assess the safety of an AI model’s features, including its core knowledge, ability to reason and autonomous capabilities, in a standardised way. Inspect provides a score based on its findings, revealing both how safe the model is and how effective the evaluation was.

As the Inspect source code is openly accessible, the global AI testing community — including businesses, research facilities and governments — can integrate it with their models and get essential safety information more quickly and easily.

SEE: Top 5 AI Trends to Watch in 2024

AI Safety Institute Chair Ian Hogarth said the Inspect team was inspired by leading open-source AI developers to create a building block towards a “shared, accessible approach to evaluations.”

He said in the press release, “We hope to see the global AI community using Inspect to not only carry out their own model safety tests, but to help adapt and build upon the open source platform so we can produce high-quality evaluations across the board.”

Secretary of State for Science, Innovation and Technology Michelle Donelan added that safe AI will improve various sectors in the U.K. from “our NHS to our transport network.”

The AISI, along with the expert group Incubator for Artificial Intelligence and Prime Minister Rishi Sunak’s office, is also recruiting AI talent to test and develop new open-source AI safety tools.

Inspect: What developers need to know

A guide on how to use the Inspect toolkit in its base form can be found on the U.K. government’s GitHub. However, the software comes with an MIT License that allows it to be copied, modified, merge published, distributed, sold and sub-licensed; this means anyone can amend or add new testing methods to the script via third-party Python packages to improve its capabilities.

Developers looking to use Inspect first need to install it and ensure they have access to an AI model. They can then build an evaluation script using the Inspect framework and run it on their model of choice.

Inspect evaluates the safety of AI models using three main components:

  1. Datasets of sample test scenarios for evaluation, including prompts and target outputs.
  2. Solvers that execute the test scenarios using the prompt.
  3. Scorers that analyse the output of the solvers and generate a score.

The source code can be accessed through the U.K. government’s GitHub repository.

What the experts are saying about Inspect

The overall response to the U.K.’s announcement of Inspect has been positive. The CEO of community AI platform Hugging Face Clément Delangue posted on X that he is interested in creating a “public leaderboard with results of the evals” of different models. Such a leaderboard could both showcase the safest AIs and encourage developers to make use of Inspect so their models can rank.

Linux Foundation Europe also posted that the open-sourcing of Inspect “aligns perfectly with our call for more open source innovation by the public sector.” Deborah Raji, a research fellow at Mozilla and AI ethicist, called it a “testament to the power of public investment in open source tooling for AI accountability” on X.

The U.K.’s moves towards safer AI

The U.K.’s AISI was launched at the AI Safety Summit in November 2023 with the three primary goals of evaluating existing AI systems, performing foundational AI safety research and sharing information with other national and international actors. Shortly after the summit, the U.K.’s National Cyber Security Centre published guidelines on the security of AI systems along with 17 other international agencies.

With the explosion in AI technologies over the past two years, there is a dire need to establish and enforce robust AI safety standards to prevent issues including bias, hallucinations, privacy infringements, IP violations and intentional misuse, which could have wider social and economic consequences.

SEE: Generative AI Defined: How it Works, Benefits and Dangers

In October 2023, the G7 countries, including the U.K., released the ‘Hiroshima’ AI code of conduct, which is a risk-based approach that intends “to promote safe, secure and trustworthy AI worldwide and will provide voluntary guidance for actions by organizations developing the most advanced AI systems.”

This March, G7 nations signed an agreement committing to explore how artificial intelligence can improve public services and boost economic growth. It also covered the joint development of an AI toolkit to inform policy-making and ensure AI used by public sector services are safe and trustworthy.

The following month, the U.K. government formally agreed to work with the U.S. on developing tests for advanced artificial intelligence models. Both countries agreed to “align their scientific approaches” and work together to “accelerate and rapidly iterate robust suites of evaluations for AI models, systems, and agents.”

This action was taken to uphold the commitments established at the first global AI Safety Summit, where governments from around the world accepted their role in safety testing the next generation of AI models.

SkyServe’s STORM Ushers in “Smartphone Moment” for Satellite Imaging

Implications Of Allowing Private Sector Into Indian Space Industry

Bengaluru-based space tech company SkyServe announced today that it has successfully achieved Smart Earth Imaging in orbit, marking an important step forward in Earth observation. By uplinking and testing their edge computing software stack STORM on a satellite, SkyServe demonstrated the ability to generate actionable insights from space in a fraction of the time.

In mid-April, SkyServe collaborated with space logistics company D-Orbit to deploy STORM on a SpaceX-launched satellite. Within seconds of capturing imagery over the Egypt-Sinai Peninsula, STORM performed intelligent tasks onboard, including error correction, cloud/water removal, and vegetation identification. The optimised data was then transmitted back to Earth, compressed by 5X.

“We’re essentially creating the iPhone moment for Earth observation,” said Vinay Simha, SkyServe’s Co-founder and CEO. “Just like Smartphones revolutionised data accessibility and user engagement, STORM enables geospatial applications with edge tasking and processing, unlocking vast new use cases from space.”

STORM signifies a paradigm shift beyond traditional Earth observation, empowering satellites to optimise for specific customer applications and deliver near-real-time insights dynamically. “The collaboration with SkyServe facilitates their in-orbit STORM platform and aligns with our mission to provide comprehensive in-orbit services,” noted Viney Jean-Francois Dhiri, D-Orbit’s Head of Business Development.

SkyServe is now leveraging Loft Orbital’s YAM-6 satellite for Mission Denali, showcasing automated tasking and hosting of geospatial models for various use cases. “We can program the satellites to identify wildfires while flying over regions like Australia and monitor water resources over Bengaluru,” explained Vishesh Vatsal, SkyServe’s Co-founder and CTO.

Later this year, SkyServe’s Mission K2 is scheduled to launch aboard ISRO’s PSLV C59. These missions are unlocking a new paradigm of real-time space-based insights by empowering analytics companies with onboard data processing capabilities.

The post SkyServe’s STORM Ushers in “Smartphone Moment” for Satellite Imaging appeared first on AIM.

92% of Indian Knowledge Workers Embrace AI at Work: Microsoft & LinkedIn Report

A staggering 92% of knowledge workers in India are using artificial intelligence (AI) at work, compared to the global average of 75%, according to the 2024 Work Trend Index released today by Microsoft and LinkedIn. The report, titled “AI at Work is Here. Now Comes the hard part,” highlights the rapid adoption of AI in the workplace and its impact on how people work, lead, and hire.

The survey, which included 31,000 people across 31 countries, also found that 91% of Indian leaders believe they need to adopt AI to stay competitive. However, 54% worry that their organisation lacks a clear plan and vision for AI implementation. Despite this, employees are eagerly embracing AI tools, with 72% of Indian AI users bringing their own AI tools to work (BYOAI).

“Data from the Work Trend Index shows that AI is now a reality at work, with India having one of the highest AI adoption rates among knowledge workers, at 92%,” said Irina Ghose, managing director of Microsoft India and South Asia. “This AI optimism presents a tremendous opportunity for organisations to invest in the right tools and training, to unlock efficiencies for employees and ultimately drive long-term business impact.”

The report also highlights the growing importance of AI skills in the job market. 75% of Indian leaders stated they wouldn’t hire someone lacking AI skills, outpacing the global average of 66%. Furthermore, 80% of leaders in India prefer to hire a less experienced candidate with AI skills over a more experienced candidate without them. LinkedIn data shows a 142x increase globally in members adding AI skills to their profiles and a 160% increase in non-technical professionals taking AI courses.

The study identified four types of AI users, ranging from skeptics to power users. AI power users in India are fundamentally reorienting their workdays, with 90% beginning their day with AI and 91% relying on it to prepare for the next day. They are also more likely to receive AI training and communication from senior leadership compared to other employees.

“AI is transforming the world of work, reshaping the talent landscape and nudging both individuals and organisations to embrace change,” said Ruchee Anand, head of Talent & Learning Solutions at LinkedIn. “As the workforce looks to tap into the benefits of AI, it’s crucial for leaders to boost their organisation’s AI capabilities through thoughtful investment in both technology and talent.”

Microsoft announced new capabilities in Copilot for Microsoft 365 to help people get started with AI, including more conversational features, proactive recommendations, and improved prompt experiences. LinkedIn also announced over 50 free learning courses to empower professionals at all levels to advance their AI aptitude.

As the adoption of generative AI at work has nearly doubled globally in the last six months, Indian leaders face the challenge of moving from experimentation to tangible business impact. The 2024 Work Trend Index provides insights into how AI is shaping the future of work and the steps organisations can take to harness its potential.

The post 92% of Indian Knowledge Workers Embrace AI at Work: Microsoft & LinkedIn Report appeared first on AIM.

Hitachi Vantara & Veeam Partner to Provide Data Protection for Hybrid Cloud Environments

Hitachi Vantara, a subsidiary of Hitachi, Ltd., and Veeam Software, a data protection and ransomware recovery company, have announced a global strategic partnership to deliver comprehensive data protection solutions for hybrid cloud environments. The collaboration aims to safeguard businesses against the growing threat of ransomware attacks and minimise downtime.

The partnership integrates Hitachi Vantara’s infrastructure portfolio with Veeam’s software, providing advanced cyber resiliency features such as ransomware detection, rapid recovery, and immutable storage. This integration enables businesses to achieve improved recovery point objectives (RPOs) and ensure data integrity across hybrid cloud environments.

Hitachi Vantara’s seven-layer, defence-in-depth strategy, combined with Veeam’s expertise, offers customers and partners more tools to address data protection and cyber resiliency challenges while reducing the cost and complexity of multiple vendors. The partnership also allows Hitachi Vantara to provide consumption models such as data protection as a service (DPaaS) that seamlessly integrate with existing infrastructure and software solutions.

According to the Veeam 2024 Data Protection Trends Report, at least 76% of organisations suffered at least one ransomware attack in the past year, emphasising the need for effective strategies and partnerships to combat growing cybersecurity risks. The collaboration between Hitachi Vantara and Veeam comes at a crucial time for enterprises, equipping them with essential tools to navigate the complexities of modern data protection and providing a reliable defence against evolving security challenges.

Hitachi Vantara’s data protection capabilities ensure 100% data availability for operational resilience and regulatory compliance. With an in-depth cyber defence strategy and a customisable storage portfolio, the company modernises data protection from edge to core, seamlessly integrating with existing solutions. Features like immutable backups and ransomware protection ensure critical data remains accessible and secure, even against sophisticated cyber threats.

The post Hitachi Vantara & Veeam Partner to Provide Data Protection for Hybrid Cloud Environments appeared first on AIM.

Can AI Interpret Dreams?

While researchers have taken the first steps toward artificial intelligence dream interpretation, the technology is still largely unproven. It might take years for high-end applications to reach the consumer market. Is there a way to use AI to interpret dreams today?

Why Would You Need AI to Interpret Dreams?

There are a few prevailing theories on why dreams happen. Some argue it’s random neuronal activity, others say it’s to process the day’s events and a few claim it’s your unconscious needs and desires surfacing. Realistically, it’s probably a combination of multiple ideas. However, none can help explain the specific meaning behind each of your nighttime visions.

Dreams are complex, incoherent and baffling for reasons unknown. You could find yourself in your grandmother’s living room speaking to Elvis Presley about dog astronauts, and everything would seem normal — understandably, you’d want to make sense of things with AI.

Even if you can comprehend your dream at face value, it’s generally accepted that a more profound meaning exists. Symbols, themes and events span cultures and generations, lending to their significance.

For example, dreaming about losing your teeth could mean you’re dealing with stress, uncertainty or insecurities in your waking life. Alternatively, a nightmare about falling could mean you don’t feel in control of your life or supported by your loved ones. Seemingly random, nonsensical events might be significant — this is why AI interpretation is a big deal.

Can You Use AI for Dream Interpretation?

Technically, you could use AI to interpret your dreams today if you get a generative model and word your prompt right. However, accuracy is an issue — if you can’t decipher your dream’s meaning, how is an algorithm supposed to? While it may guess or output nonsense to appease you, would you be satisfied with its generic responses?

Even if you don’t feel connected to your dreams, they’re incredibly personal experiences. Each is a jumbled collection of your memories, emotions, relationships and subconscious thoughts. While you can technically use a large language model (LLM) to decipher them, its output would only be partially accurate at best.

That said, relatively accurate AI interpretations aren’t impossible. Some researchers have already uncovered the technology needed to make it work — multiple studies conducted in 2023 prove it is feasible. At this point, testing, prototyping and commercializing these discoveries is just a matter of time, resources and funding.

The Technology Behind AI Dream Interpretation

Training data is fundamental to any AI-powered dream interpretation technology. What information can you feed an algorithm to return consistent, accurate output? Theoretically, you could use text-based descriptions, statistics on commonly dreamed themes or artists’ renditions. However, sourcing enough would be an issue.

Some researchers overcame this obstacle by providing machine learning (ML) models with dozens of hours of brain activity scans. This approach is interesting for a few reasons. For one, it relies on evidence-based information instead of the dreamer’s commentary — which, coincidentally, increases data availability drastically.

It also identifies the underlying drivers of rapid eye movement (REM) sleep, targeting the language or image-processing areas of the brain rather than attempting to make sense of the dream itself. As a result, AI isn’t as affected by the dreamer’s bias — meaning its chance of outputting a relatively objective, accurate interpretation is higher.

Aside from training data, you need a generative model to reconstruct, interpret or translate information. This technology’s popularity is rapidly increasing — its market size will have a compound annual growth rate of 36.5% from 2024 to 2030 — so sourcing an out-of-the-box solution would be easy. However, building one from the ground up would be wise.

Most AI-powered dream interpretation solutions need natural language processing (NLP) and image recognition technology to some extent. After all, most REM sleep is a combination of images and words. Beyond that, you could use anything from deep learning models to neural networks to make your tool work.

Ways You Can Use AI to Interpret Dreams

While generative models can produce text, images, audio and music, only a few proven methods of AI-driven dream interpretation currently exist.

1. Text-to-Text Generation

The simplest method is text-to-text generation, where an LLM, NLP or ML model analyzes your typed prompts. You enter what you remember about your dream or follow a decision-tree format to get answers. On the one hand, it’s fast and straightforward. On the other, it’s inaccurate — you forget most of the REM stage upon waking, so the AI works off a fragmented narrative.

2. EEG-to-Text Generation

An LLM and an electroencephalogram (EEG) recording the brain's electrical signals can turn thoughts into words. You must read while wearing a soft cap filled with sensors for this to work. The model converts that activity into text.

Your brain sends a specific signal when you think of a word or phrase. An algorithm can find patterns in this activity, making translation possible. You could use this EEG-to-text generation model to develop a transcript of your REM sleep.

Peer-reviewed research proved this model can achieve 60% accuracy, which is impressive for a proof of concept. The soft cap is portable and relatively cheap to produce, making it one of the few inventions that might see mass-market applications.

3. fMRI-to-Image Generation

A research group discovered a deep learning model that can analyze functional magnetic resonance imaging (fMRI) scans — images of the brain’s blood flow — to accurately recreate images people see. It trained on 10,000 photos to interpret what people were viewing.

As the study’s participants stared at an image, their temporal lobe registered its content, and their occipital lobe cataloged its scale and layout. The AI tracked this activity to reconstruct what they were seeing. While its recreations started as noise, they slowly became recognizable.

4. fMRI-to-Text Generation

Researchers used fMRI scans and an LLM in an encoding and decoding system to reconstruct brain activity in a text-based format. The leading neuroscientist on the project said the team was shocked it worked as well as it did.

As people read text or watched silent videos, the AI described the content — and usually got the gist. For instance, one person read, “I didn't know whether to scream, cry or run away. Instead, I said leave me alone, I don't need your help.” The model outputted, “Started to scream and cry and then she just said I told you to leave me alone, you can't hurt me anymore.”

Interestingly, when the researchers tailored the tool for one of the study’s participants, it could only reconstruct unintelligible gibberish when used on another. There might be potential for personalized algorithm-based dream interpreters.

Why You Should Be Wary of an AI Interpreter

While using algorithms for dream interpretation sounds promising, there are a few drawbacks to be aware of. The most significant is hallucination. According to one survey, 89% of machine learning engineers working with generative AI say their models make things up — and 93% see it happen daily or weekly.

Until AI engineers iron out the hallucination issue, this technology’s application in REM sleep is a gray area. While using it for fun is harmless, some people — those who would typically go to therapists or psychologists for dream interpretations — might get an output that damages their mental health or sets back their treatment progress.

It might subconsciously influence you even if you’re skeptical or indifferent to an algorithm’s output. For example, you might grow distant from your partner after the model tells you your cheating dream signifies a failing relationship.

Being at the other end of the spectrum can be just as damaging. Fully believing in the AI’s output — despite potential bias or hallucinations — could negatively affect you. This overconfidence might make you misinterpret your emotions, interactions with others or past trauma, leading to unwanted situations in your waking life.

There’s also the issue of the sticker price. Text-to-text generation is the most accessible and affordable but is inaccurate. If you want something better, prepare to pay up. Considering that a single MRI scan can cost up to $4,000 — and one machine can be a multimillion-dollar investment — accurate AI dream interpreters are probably years away.

What Does the Future Hold for This Technology?

Having a personal AI dream interpreter could be exciting and helpful. Even if this technology doesn’t enter the consumer market soon, it will likely find a place in therapy, psychology and medical practices. One day, you might use it to work through past trauma, identify sleep issues or uncover hidden emotions.

Intel is Bullish on India with its Xeon Processors

Intel Says that India Doesn’t Need Big GPUs

According to a recent report by IDC, unveiled at Intel’s AI for India Conference in Delhi, India’s spending on AI may reach $5.1 billion by 2027. This surge is attributed largely to AI infrastructure provisioning. This includes spending on hardware such as servers and chips, as well as software components like frameworks and libraries.

Santhosh Viswanathan, vice president and managing director, India region, Intel, said, “With an unmatched talent pool, frugal innovation, and data at scale, India stands poised to lead the global AI revolution.” He added that when it comes to building AI capabilities within India, the country does not necessarily need to rely on big GPUs.

Viswanathan said that when it comes to most of the solutions being built in India, Intel’s Xeon processors are enough to deliver the AI needs. “If you are an enterprise running a model with say 15 to 30 billion parameters, Xeon is enough to run these models effectively,” he said.

Viswanathan also highlighted that if companies are building models for RAG on personal data inference, Xeon becomes a powerhouse. “If you have small datasets that are very local and do not have many parameters, Xeon is available everywhere for you to test and try out,” he added, saying that customers can already test out the current models available in the market on the existing Xeon-powered data centres across the country.

Xeon is omnipresent

“Not everybody is trying to build the next largest LLM and needs a trillion parameters,” said Viswanathan. Another use case that he highlights is on-edge, for which Intel’s CPU and NPU are very well positioned for privacy and the cost is significantly lower too.

“AI is not everywhere yet, it’s in one place and you need a lot of GPUs and massive data centres [for building AI]. But over time this is going to change and the costs will come down,” he added.

“You do not need to go back and build massive infrastructures. AI can start today with the infrastructure that you have,” he said. Viswanathan explained that Intel’s go-to-market strategy is about making customers in India realise that they can existing infrastructure that is already using Xeon processors.

Viswanathan said that the reason Intel is going bullish on India is the country’s ability to solve big problems with frugality, like in the case of UPI. He narrated how Intel was the company to bring WiFi in India and just like the internet, Viswanathan said, Intel wants to bring AI everywhere in India.

“Intel’s goal is to democratise access, and the architecture is open,” said Viswanathan. He added that today, people are waiting for compute and this is where Intel comes in with its Xeon processors. Apart from running high-end AI models, Xeon is also effective and scalable for other workloads, and does not cost as much.

“That is why I am bullish on Xeon as it is already available across all databases. It is omnipresent,” he added.

Intel also offers its Developer Cloud where customers can test out its offerings while running them in a secure environment.

For Intel, AI stands for ‘Amazing India’

“When you really need to build something big and test the performance, Gaudi is always there,” Viswanathan said, and added that the company is working with several partners in India to test and benchmark its AI hardware. All of this is along with making AI PCs in partnership with OEM ecosystems such as HPE, Dell, and Lenovo.

Furthermore, the recently announced Gaudi 3 at Intel Vision accelerator is expected to outperform the NVIDIA H100 by 50% in inference throughput on an average and achieve a 40% increase in inference power-efficiency across different parameter models.

This, along with the newer Xeon 6 processor, are also optimised heavily for RAG.

Intel is positioning itself in the market as a low-cost alternative to its competitors like NVIDIA and AMD. Viswanathan said that Intel is always an alternative for a company that is struggling with acquiring compute as the cost is too high. He explains that Xeon is a workhorse for a lot of use cases that do not need an accelerator.

Intel indeed has been bullish on India. There were several collaborations announced at the Intel Vision 2024 such as Bharti Airtel, Infosys, and Ola Krutrim. Moreover, Zoho is also leveraging Intel’s processors for its generative AI offerings.

Infosys’ partnership with Intel is about integrating 4th and 5th Gen Intel Xeon processors, Intel Gaudi 2 AI accelerators, and Intel Core Ultra into Infosys Topaz. This collaboration aims to offer AI-first services, solutions, and platforms to accelerate business value through generative AI technologies.

Ola Krutrim recently launched its open-source model on Databricks platforms. The company utilised Intel Gaudi 2 clusters to pre-train and fine-tune its foundational models with generative capabilities in ten languages, achieving industry-leading price/performance ratios compared to existing market solutions.

Additionally, Krutrim is currently pre-training a larger foundational model on an Intel Gaudi 2 cluster, further advancing its AI capabilities.

Intel also has Make in India partners and is in talks with the government to build systems locally and fully designed in India. “Anybody who is keen on reducing the carbon footprint while also reducing the cost on their wallet, we are absolutely there,” added Viswanathan.

The post Intel is Bullish on India with its Xeon Processors appeared first on AIM.

Scale Your Social Channels With This $50 App

TL;DR: Give your social media channels a boost with UNUM Pro, an Apple App of the Day winner that’s now on sale for just $49.99 (reg. $719) for a lifetime subscription.

Social media is a crucial channel for many businesses. However, growing your social channels while your business grows requires a lot of hands-on management and invested dollars and hours. Or does it?

With UNUM Pro, it’s a whole lot easier. This all-in-one social media tool has more than 20 million users and has been named an Apple App of the Day because it’s an intuitive, automated way of creating and editing posts, scheduling content and gathering insights into how to scale all of your channels.

Features

UNUM has integrations for Instagram, TikTok, LinkedIn, Pinterest, Facebook and many more social media platforms, giving you all the tools you need to grow your channels no matter where they are.

UNUM’s AI tools can help you streamline your workflows by writing captions, creating optimized hashtags, scheduling posts for the optimal time and auto-posting to stay on the right cadence for your channels. It provides more than 500 overlays, filters and more creative tools to amplify your content and help it stand out from the masses. It allows you to link everything in a customized BioBar to drive traffic where you want it.

Most importantly, it provides detailed analytics to help you better understand how certain posts perform, when the best time to post on certain platforms is and much more. It’s truly an all-in-one tool to help content creators and businesses grow their channels.

Give your social channels the automated help they need to grow. Right now, you can get a lifetime subscription to UNUM Pro for just $49.99 (reg. $719).

Get UNUM Pro

Prices and availability are subject to change.

xLSTM : A Comprehensive Guide to Extended Long Short-Term Memory

For over two decades, Sepp Hochreiter's pioneering Long Short-Term Memory (LSTM) architecture has been instrumental in numerous deep learning breakthroughs and real-world applications. From generating natural language to powering speech recognition systems, LSTMs have been a driving force behind the AI revolution.

However, even the creator of LSTMs recognized their inherent limitations that prevented them from realizing their full potential. Shortcomings like an inability to revise stored information, constrained memory capacities, and lack of parallelization paved the way for the rise of transformer and other models to surpass LSTMs for more complex language tasks.

But in a recent development, Hochreiter and his team at NXAI have introduced a new variant called extended LSTM (xLSTM) that addresses these long-standing issues. Presented in a recent research paper, xLSTM builds upon the foundational ideas that made LSTMs so powerful, while overcoming their key weaknesses through architectural innovations.

At the core of xLSTM are two novel components: exponential gating and enhanced memory structures. Exponential gating allows for more flexible control over the flow of information, enabling xLSTMs to effectively revise decisions as new context is encountered. Meanwhile, the introduction of matrix memory vastly increases storage capacity compared to traditional scalar LSTMs.

But the enhancements don't stop there. By leveraging techniques borrowed from large language models like parallelizability and residual stacking of blocks, xLSTMs can efficiently scale to billions of parameters. This unlocks their potential for modeling extremely long sequences and context windows – a capability critical for complex language understanding.

The implications of Hochreiter's latest creation are monumental. Imagine virtual assistants that can reliably track context over hours-long conversations. Or language models that generalize more robustly to new domains after training on broad data. Applications span everywhere LSTMs made an impact – chatbots, translation, speech interfaces, program analysis and more – but now turbocharged with xLSTM's breakthrough capabilities.

In this deep technical guide, we'll dive into the architecturalDetailsOf xLSTM, evaluating its novel components like scalar and matrix LSTMs, exponential gating mechanisms, memory structures and more. You'll gain insights from experimental results showcasing xLSTM's impressive performance gains over state-of-the-art architectures like transformers and latest recurrent models.

Understanding the Origins: The Limitations of LSTM

Before we dive into the world of xLSTM, it's essential to understand the limitations that traditional LSTM architectures have faced. These limitations have been the driving force behind the development of xLSTM and other alternative approaches.

  1. Inability to Revise Storage Decisions: One of the primary limitations of LSTM is its struggle to revise stored values when a more similar vector is encountered. This can lead to suboptimal performance in tasks that require dynamic updates to stored information.
  2. Limited Storage Capacities: LSTMs compress information into scalar cell states, which can limit their ability to effectively store and retrieve complex data patterns, particularly when dealing with rare tokens or long-range dependencies.
  3. Lack of Parallelizability: The memory mixing mechanism in LSTMs, which involves hidden-hidden connections between time steps, enforces sequential processing, hindering the parallelization of computations and limiting scalability.

These limitations have paved the way for the emergence of Transformers and other architectures that have surpassed LSTMs in certain aspects, particularly when scaling to larger models.

The xLSTM Architecture

Extended LSTM (xLSTM) family

Extended LSTM (xLSTM) family

At the core of xLSTM lies two main modifications to the traditional LSTM framework: exponential gating and novel memory structures. These enhancements introduce two new variants of LSTM, known as sLSTM (scalar LSTM) and mLSTM (matrix LSTM).

  1. sLSTM: The Scalar LSTM with Exponential Gating and Memory Mixing
    • Exponential Gating: sLSTM incorporates exponential activation functions for input and forget gates, enabling more flexible control over information flow.
    • Normalization and Stabilization: To prevent numerical instabilities, sLSTM introduces a normalizer state that keeps track of the product of input gates and future forget gates.
    • Memory Mixing: sLSTM supports multiple memory cells and allows for memory mixing via recurrent connections, enabling the extraction of complex patterns and state tracking capabilities.
  2. mLSTM: The Matrix LSTM with Enhanced Storage Capacities
    • Matrix Memory: Instead of a scalar memory cell, mLSTM utilizes a matrix memory, increasing its storage capacity and enabling more efficient retrieval of information.
    • Covariance Update Rule: mLSTM employs a covariance update rule, inspired by Bidirectional Associative Memories (BAMs), to store and retrieve key-value pairs efficiently.
    • Parallelizability: By abandoning memory mixing, mLSTM achieves full parallelizability, enabling efficient computations on modern hardware accelerators.

These two variants, sLSTM and mLSTM, can be integrated into residual block architectures, forming xLSTM blocks. By residually stacking these xLSTM blocks, researchers can construct powerful xLSTM architectures tailored for specific tasks and application domains.

The Math

Traditional LSTM:

The original LSTM architecture introduced the constant error carousel and gating mechanisms to overcome the vanishing gradient problem in recurrent neural networks.

The repeating module in an LSTM

The repeating module in an LSTM – Source

The LSTM memory cell updates are governed by the following equations:

Cell State Update: ct = ft ⊙ ct-1 + it ⊙ zt

Hidden State Update: ht = ot ⊙ tanh(ct)

Where:

  • 𝑐𝑡​ is the cell state vector at time 𝑡t
  • 𝑓𝑡​ is the forget gate vector
  • 𝑖𝑡 is the input gate vector
  • 𝑜𝑡 is the output gate vector
  • 𝑧𝑡 is the input modulated by the input gate
  • ⊙ represents element-wise multiplication

The gates ft, it, and ot control what information gets stored, forgotten, and outputted from the cell state ct, mitigating the vanishing gradient issue.

xLSTM with Exponential Gating:

The xLSTM architecture introduces exponential gating to allow more flexible control over the information flow. For the scalar xLSTM (sLSTM) variant:

Cell State Update: ct = ft ⊙ ct-1 + it ⊙ zt

Normalizer State Update: nt = ft ⊙ nt-1 + it

Hidden State Update: ht = ot ⊙ (ct / nt)

Input & Forget Gates: it = exp(W_i xt + R_i ht-1 + b_i) ft = σ(W_f xt + R_f ht-1 + b_f) OR ft = exp(W_f xt + R_f ht-1 + b_f)

The exponential activation functions for the input (it) and forget (ft) gates, along with the normalizer state nt, enable more effective control over memory updates and revising stored information.

xLSTM with Matrix Memory:

For the matrix xLSTM (mLSTM) variant with enhanced storage capacity:

Cell State Update: Ct = ft ⊙ Ct-1 + it ⊙ (vt kt^T)

Normalizer State Update: nt = ft ⊙ nt-1 + it ⊙ kt

Hidden State Update: ht = ot ⊙ (Ct qt / max(qt^T nt, 1))

Where:

  • 𝐶𝑡​ is the matrix cell state
  • 𝑣𝑡 and 𝑘𝑡 are the value and key vectors
  • 𝑞𝑡 is the query vector used for retrieval

These key equations highlight how xLSTM extends the original LSTM formulation with exponential gating for more flexible memory control and matrix memory for enhanced storage capabilities. The combination of these innovations allows xLSTM to overcome limitations of traditional LSTMs.

Key Features and Advantages of xLSTM

  1. Ability to Revise Storage Decisions: Thanks to exponential gating, xLSTM can effectively revise stored values when encountering more relevant information, overcoming a significant limitation of traditional LSTMs.
  2. Enhanced Storage Capacities: The matrix memory in mLSTM provides increased storage capacity, enabling xLSTM to handle rare tokens, long-range dependencies, and complex data patterns more effectively.
  3. Parallelizability: The mLSTM variant of xLSTM is fully parallelizable, allowing for efficient computations on modern hardware accelerators, such as GPUs, and enabling scalability to larger models.
  4. Memory Mixing and State Tracking: The sLSTM variant of xLSTM retains the memory mixing capabilities of traditional LSTMs, enabling state tracking and making xLSTM more expressive than Transformers and State Space Models for certain tasks.
  5. Scalability: By leveraging the latest techniques from modern Large Language Models (LLMs), xLSTM can be scaled to billions of parameters, unlocking new possibilities in language modeling and sequence processing tasks.

Experimental Evaluation: Showcasing xLSTM's Capabilities

The research paper presents a comprehensive experimental evaluation of xLSTM, highlighting its performance across various tasks and benchmarks. Here are some key findings:

  1. Synthetic Tasks and Long Range Arena:
    • xLSTM excels at solving formal language tasks that require state tracking, outperforming Transformers, State Space Models, and other RNN architectures.
    • In the Multi-Query Associative Recall task, xLSTM demonstrates enhanced memory capacities, surpassing non-Transformer models and rivaling the performance of Transformers.
    • On the Long Range Arena benchmark, xLSTM exhibits consistent strong performance, showcasing its efficiency in handling long-context problems.
  2. Language Modeling and Downstream Tasks:
    • When trained on 15B tokens from the SlimPajama dataset, xLSTM outperforms existing methods, including Transformers, State Space Models, and other RNN variants, in terms of validation perplexity.
    • As the models are scaled to larger sizes, xLSTM continues to maintain its performance advantage, demonstrating favorable scaling behavior.
    • In downstream tasks such as common sense reasoning and question answering, xLSTM emerges as the best method across various model sizes, surpassing state-of-the-art approaches.
  3. Performance on PALOMA Language Tasks:
    • Evaluated on 571 text domains from the PALOMA language benchmark, xLSTM[1:0] (the sLSTM variant) achieves lower perplexities than other methods in 99.5% of the domains compared to Mamba, 85.1% compared to Llama, and 99.8% compared to RWKV-4.
  4. Scaling Laws and Length Extrapolation:
    • When trained on 300B tokens from SlimPajama, xLSTM exhibits favorable scaling laws, indicating its potential for further performance improvements as model sizes increase.
    • In sequence length extrapolation experiments, xLSTM models maintain low perplexities even for contexts significantly longer than those seen during training, outperforming other methods.

These experimental results highlight the remarkable capabilities of xLSTM, positioning it as a promising contender for language modeling tasks, sequence processing, and a wide range of other applications.

Real-World Applications and Future Directions

The potential applications of xLSTM span a wide range of domains, from natural language processing and generation to sequence modeling, time series analysis, and beyond. Here are some exciting areas where xLSTM could make a significant impact:

  1. Language Modeling and Text Generation: With its enhanced storage capacities and ability to revise stored information, xLSTM could revolutionize language modeling and text generation tasks, enabling more coherent, context-aware, and fluent text generation.
  2. Machine Translation: The state tracking capabilities of xLSTM could prove invaluable in machine translation tasks, where maintaining contextual information and understanding long-range dependencies is crucial for accurate translations.
  3. Speech Recognition and Generation: The parallelizability and scalability of xLSTM make it well-suited for speech recognition and generation applications, where efficient processing of long sequences is essential.
  4. Time Series Analysis and Forecasting: xLSTM's ability to handle long-range dependencies and effectively store and retrieve complex patterns could lead to significant improvements in time series analysis and forecasting tasks across various domains, such as finance, weather prediction, and industrial applications.
  5. Reinforcement Learning and Control Systems: The potential of xLSTM in reinforcement learning and control systems is promising, as its enhanced memory capabilities and state tracking abilities could enable more intelligent decision-making and control in complex environments.

Architectural Optimizations and Hyperparameter Tuning

While the current results are promising, there is still room for optimizing the xLSTM architecture and fine-tuning its hyperparameters. Researchers could explore different combinations of sLSTM and mLSTM blocks, varying the ratios and placements within the overall architecture. Additionally, a systematic hyperparameter search could lead to further performance improvements, particularly for larger models.

Hardware-Aware Optimizations: To fully leverage the parallelizability of xLSTM, especially the mLSTM variant, researchers could investigate hardware-aware optimizations tailored for specific GPU architectures or other accelerators. This could involve optimizing the CUDA kernels, memory management strategies, and leveraging specialized instructions or libraries for efficient matrix operations.

Integration with Other Neural Network Components: Exploring the integration of xLSTM with other neural network components, such as attention mechanisms, convolutions, or self-supervised learning techniques, could lead to hybrid architectures that combine the strengths of different approaches. These hybrid models could potentially unlock new capabilities and improve performance on a wider range of tasks.

Few-Shot and Transfer Learning: Exploring the use of xLSTM in few-shot and transfer learning scenarios could be an exciting avenue for future research. By leveraging its enhanced memory capabilities and state tracking abilities, xLSTM could potentially enable more efficient knowledge transfer and rapid adaptation to new tasks or domains with limited training data.

Interpretability and Explainability: As with many deep learning models, the inner workings of xLSTM can be opaque and difficult to interpret. Developing techniques for interpreting and explaining the decisions made by xLSTM could lead to more transparent and trustworthy models, facilitating their adoption in critical applications and promoting accountability.

Efficient and Scalable Training Strategies: As models continue to grow in size and complexity, efficient and scalable training strategies become increasingly important. Researchers could explore techniques such as model parallelism, data parallelism, and distributed training approaches specifically tailored for xLSTM architectures, enabling the training of even larger models and potentially reducing computational costs.

These are a few potential future research directions and areas for further exploration with xLSTM.

Conclusion

The introduction of xLSTM marks a significant milestone in the pursuit of more powerful and efficient language modeling and sequence processing architectures. By addressing the limitations of traditional LSTMs and leveraging novel techniques such as exponential gating and matrix memory structures, xLSTM has demonstrated remarkable performance across a wide range of tasks and benchmarks.

However, the journey does not end here. As with any groundbreaking technology, xLSTM presents exciting opportunities for further exploration, refinement, and application in real-world scenarios. As researchers continue to push the boundaries of what is possible, we can expect to witness even more impressive advancements in the field of natural language processing and artificial intelligence.

Transform Your Career with Praxis’s Top-Ranked PGP in Data Science with Generative AI and ML

Transform Your Career with Praxis’s Top-Ranked PGP in Data Science with Generative AI and ML

In today’s data-driven world, the need for skilled data scientists is more critical than ever. Praxis Tech School is at the forefront of this transformation, offering a nine-month, full-time Post Graduate Program in Data Science (PGPDS) with generative AI and machine learning, in Kolkata and Bengaluru.

Through this course, Praxis aims to create resources that will drive India’s digital transformation. The program prepares students for exciting careers in data science and AI by offering a comprehensive curriculum aligned with industry needs and a robust placement program.

Apply for the programme

Analytics India Magazine has ranked it India’s No.1 Data Science Program for three consecutive years (2021, 2022, and 2023).

As a pioneer in data science education in India, Praxis has been instrumental in shaping the careers of many data professionals. The course offers impressive career outcomes with the highest CTC of ₹21 LPA and an average of ₹13.5 LPA.

Praxis is the only institute in India to have placed over 35 batches of data science students with key recruiters like Accenture, Chubb, EY, Fractal, Genpact, HSBC, ICICI Bank, Landmark Group, L&T, NielsenIQ, Poonawalla Fincorp, PwC, Subex, Tata Metaliks, and Vedanta.

Program Highlights

  • Comprehensive Curriculum: Over 550 hours of impactful learning through lectures, case studies, hands-on labs, assignments, and projects.
  • Wide Range of Tools and Techniques: Master GenAI, Python, R, SQL, Spark, Tableau, PowerBI, along with statistical, machine learning, and deep learning techniques.
  • Industry-Relevant Subjects: Covering marketing, finance, human resources, operations, and more.
  • Trimester-Based Learning: The program is structured into three trimesters, each focusing on different aspects of data science.

Learning Outcomes

  • Data Management Skills: Learn to source, clean, and transform data.
  • Modelling Techniques: Apply these techniques to solve real-world business problems.
  • Analytical Skills: Analyse and interpret data, and present results using advanced visualisation techniques.
  • Technical Proficiency: Use cutting-edge tools like Python, R, SQL, and Spark, as well as platforms such as Hadoop, TensorFlow, and PyTorch.
  • Ethical Awareness: Understand data security, integrity, and privacy issues.
  • Teamwork and Leadership: Develop skills to work effectively in teams and lead projects.

Alumni Testimonials

Shivani Sinha, data scientist at Shaadi.com, shares, “Praxis has not only enhanced my analytical skills, but also made me competitive in the field of data science. The support and opportunities provided by Praxis were instrumental in my success.”

Saurabh Sharma, senior data scientist at Valiance Solutions, states, “The curriculum at Praxis is a perfect blend of theoretical and practical approaches, which has helped me solve real-life business problems.”

Sagar Patil, assistant manager, analytics, at Tata Capital, says, “With no prior programming experience, I received perfect guidance and resources at Praxis. The faculty and the placement program are its main strengths.”

Eligibility

Undergraduate eligibility: Graduates with at least 60% in BE/BTech/BSc (with economics/statistics) or any other stream with mathematics/statistics from a recognised or deemed university.

Postgraduate eligibility: Postgraduates with at least 60% in ME/MTech/MSc (with economics/statistics) from a recognised or deemed university or college.

School Eligibility: At least 60% in Class X and XII.

Application Process

The selection process involves the Praxis Aptitude Test (PAT), essay writing, and a personal interview, with due weightage given to the candidate’s academic and corporate performance.

Application Deadline: June 22, 2024.

Apply for the programme

The post Transform Your Career with Praxis’s Top-Ranked PGP in Data Science with Generative AI and ML appeared first on AIM.