AI — Страница 1255

AMD’s ROCm is Ready To Challenge NVIDIA’s CUDA

While the world wants more of NVIDIA GPUs, AMD has released MI300X, which is arguably a lot faster than NVIDIA. AMD aims to challenge NVIDIA not only through the hardware side but also plans to corner it on the software side with its open source ROCm, a direct competitor to NVIDIA’s CUDA.

“As important as the hardware is, software is what really drives innovation,” Lisa Su said, talking about the ROCm, which is releasing in the coming week.

At Advancing AI, it was clear that AMD’s focus on software has paved the way for success. Victor Peng, President of AMD, showed how building a strong ecosystem has enabled the company to create a successful open source framework in ROCm.

Peng introduced the latest iteration of its parallel computing framework, ROCm 6, optimised specifically for a comprehensive software stack for AMD Instinct, particularly catering to large language models in generative AI.

Everyone loves open source

“We architectured ROCm to be modular and open source for broad user accessibility and rapid contribution from the open source AI community,” Peng said, adding that it is the software strategy highlighting how CUDA is proprietary and closed source.

Furthermore, ROCm is now also supported on Radeon GPUs, along with Ryzen 1.0 software, for making AI on edge, making it more accessible for AI researchers and developers.

During the presentation, Peng also showed the testimonial of Phillipe Tillet of OpenAI, who wrote, “OpenAI is working with AMD in support of an open ecosystem. We plan to support AMD’s GPUs including MI300 in the standard Triton distribution starting with the upcoming 3.0 release.” Tillet is the creator of Triton.

In a collaborative effort with three emerging AI startups – Databricks, Essential AI, and Lamini – AMD showcased how these companies leverage the AMD Instinct M1300X accelerators and the open ROCm 6 software stack to deliver differentiated AI solutions for enterprise customers. All three of the startups have been using ROCm along with MI250X and boasting about its performance on various instances.

*From left to right: Victor Peng, Ion Stoica, Ashish Vaswani, Sharon Zhou*

Ion Stoica, co-founder of Databricks; Ashish Vaswani, co-founder of Essential AI; and Sharon Zhou, co-founder of Lamini, discussed how they have been leveraging AMD hardware and software all this while, and proving that the open nature of the technology has been helping them fully own the technology.

“ROCm runs out of the box from day one,” Stoica said, highlighting it was very easy to integrate it within Databricks stack after the acquisition of MosaicML, with just a little optimisation. He further added that Databricks is using MI250X for almost all its software workflows, and are eagerly waiting for MI300X.

ROCm Vs CUDA: Apple to Apple Comparison

“We have reached beyond CUDA,” said Zhou. Lamini has previously highlighted in their blog how they have found its moat with AMD, and how ROCm is production-ready. The whole mission of Lamini was to help build small language models within enterprises easily accessible and easy to use, and AMD with ROCm has been helping them.

AMD continues its strategic investments in software with companies such as Mipsology and Nod.AI, which have helped the company improve on its software part of AI massively.

Many of the open source tools such as PyTorch are already ready to be used with ROCm on MI300X, which makes it easily accessible for most of the developers. The features of this CUDA alternative include support for new data types, advanced graph and kernel optimisations, optimised libraries, and state-of-the-art attention algorithms.

Notably, the performance boost is remarkable, with an approximately 8x increase in overall latency for text generation compared to ROCm 5 running on the MI250.

Peng showcased how MI300X with ROCm 6 is eight times faster than MI250X with ROCm 5, when inference Llama 2 70B.

On smaller models such as Llama 2 13B, ROCm with MI300X showcased 1.2 times better performance than NVIDIA coupled with CUDA on a single GPU.

ROCm 6 now supports Dynamic FP16, BF16, and FP8, for higher performance and reducing memory usage. The new release also comes with open-art libraries and supports various key features for generative AI, including FlashAttention, HIPGraph, and vLLM, with 1.3X, 1.4X, and 2.6X speed up respectively.

ROCm 6 and MI300X will drive an inflection point in developer adoption, “I’m confident of that. We are empowering innovators to realise the profound benefits of pervasive AI, faster on AMD,” concluded Peng.

The post AMD’s ROCm is Ready To Challenge NVIDIA’s CUDA appeared first on Analytics India Magazine.

Liquid AI, a new MIT spinoff, wants to build an entirely new type of AI

Liquid AI, a new MIT spinoff, wants to build an entirely new type of AI Kyle Wiggers 1 day

An MIT spinoff co-founded by robotics luminary Daniela Rus aims to build general-purpose AI systems powered by a relatively new type of AI model called a liquid neural network.

The spinoff, aptly named Liquid AI, emerged from stealth this morning and announced that it has raised $37.5 million — substantial for a two-stage seed round — from VCs and organizations including OSS Capital, PagsGroup, WordPress parent company Automattic, Samsung Next, Bold Capital Partners and ISAI Cap Venture, as well as angel investors like GitHub co-founder Tom Preston Werner, Shopify co-founder Tobias Lütke and Red Hat co-founder Bob Young.

The tranche values Liquid AI at $303 million post-money.

Joining Rus on the founding Liquid AI team are Ramin Hasani (CEO), Mathias Lechner (CTO) and Alexander Amini (chief scientific officer). Hasani was previously the principal AI scientist at Vanguard before joining MIT as a postdoctoral associate and research associate, while Lechner and Amini are longtime MIT researchers, having contributed — along with Hasani and Rus — to the invention of liquid neural networks.

What are liquid neural networks, you might be wondering? My colleague Brian Heater has written about them extensively, and I strongly encourage you to read his recent interview with Rus on the topic. But I’ll do my best to cover the salient points.

A research paper titled “Liquid Time-constant Networks,” published at the tail end of 2020 by Hasani, Rus, Lechner, Amini and others, put liquid neural networks on the map following several years of fits and starts; liquid neural networks as a concept have been around since 2018.

Image Credits: MIT CSAIL

“The idea was invented originally at the Vienna University of Technology, Austria at professor Radu Grosu’s lab, where I completed my Ph.D. and Mathias Lechner his master’s degree,” Hasani told TechCrunch in an email interview. “The work then got refined and scaled at Rus’ lab at MIT CSAIL, where Amini and Rus joined Mathias and I.”

Liquid neural networks consist of “neurons” governed by equations that predict each individual neuron’s behavior over time, like most other modern model architectures. The “liquid” bit in the term “liquid neural networks” refers to the architecture’s flexibility; inspired by the “brains” of roundworms, not only are liquid neural networks much smaller than traditional AI models, but they require far less compute power to run.

It’s helpful, I think, to compare a liquid neural network to a typical generative AI model.

GPT-3, the predecessor to OpenAI’s text-generating, image-analyzing model GPT-4, contains about 175 billion parameters and ~50,000 neurons — “parameters” being the parts of the model learned from training data that essentially define the skill of the model on a problem (in GPT-3’s case generating text). By contrast, a liquid neural network trained for a task like navigating a drone through an outdoor environment can contain as few as 20,000 parameters and fewer than 20 neurons.

Generally speaking, fewer parameters and neurons translates to less compute needed to train and run the model, an attractive prospect at a time when AI compute capacity is at a premium. A liquid neural network designed to drive a car autonomously could in theory run on a Raspberry Pi, to give a concrete example.

Liquid neural networks’ small size and straightforward architecture afford the added advantage of interpretability. It makes intuitive sense — figuring out the function of every neuron inside a liquid neural network is a more manageable task than figuring out the function of the 50,000-or-so neurons in GPT-3 (although there have been reasonably successful efforts to do this).

Now, few-parameter models capable of autonomous driving, text generation and more already exist. But low overhead isn’t the only thing that liquid neural networks have going for them.

Liquid neural networks’ other appealing — and arguably more unique — feature is their ability to adapt their parameters for “success” over time. The networks consider sequences of data as opposed to the isolated slices or snapshots most models process and adjust the exchange of signals between their neurons dynamically. These qualities let liquid neural networks deal with shifts in their surroundings and circumstances even if they weren’t trained to anticipate these shifts, such as changing weather conditions in the context of self-driving.

In tests, liquid neural networks have edged out other state-of-the-art algorithms in predicting future values in datasets spanning atmospheric chemistry to car traffic. But more impressive — at least to this writer — is what they’ve achieved in autonomous navigation.

Earlier this year, Rus and the rest of Liquid AI’s team trained a liquid neural network on data collected by a professional human drone pilot. They then deployed the algorithm on a fleet of quadrotors, which underwent long-distance, target-tracking and other tests in a range of outdoor environments, including a forest and dense city neighborhood.

According to the team, the liquid neural network beat other models trained for navigation — managing to make decisions that led the drones to targets in previously unexplored spaces even in the presence of noise and other challenges. Moreover, the liquid neural network was the only model that could reliably generalize to scenarios it hadn’t seen without any fine-tuning.

Drone search and rescue, wildlife monitoring and delivery are among the more obvious applications of liquid neural networks. But Rus and the rest of the Liquid AI team assert that the architecture is suited to analyzing any phenomena that fluctuate over time, including electric power grids, medical readouts, financial transactions and severe weather patterns. As long as there’s a dataset with sequential data, like video, liquid neural networks can train on it.

So what exactly does Liquid AI the startup hope to achieve with this powerful new(ish) architecture? Plain and simple, commercialization.

“[We compete] with foundation model companies building GPTs,” Hasani said — not naming names but not-so-subtly gesturing toward OpenAI and its many rivals (e.g. Anthropic, Stability AI, Cohere, AI21 Labs, etc.) in the generative AI space. “[The seed funding] will allow us to build the best-in-class new Liquid foundation models beyond GPTs.”

One presumes work will continue on the liquid neural network architecture, as well. Just in 2022, Rus’ lab devised a way to scale liquid neural networks far beyond what was once computationally practical; other breakthroughs could be lurking on the horizon with any luck.

Beyond designing and training new models, Liquid AI plans to provide on-premises and private AI infrastructure for customers and a platform that’ll enable these customers to build their own models for whatever use cases they conjure up — subject to Liquid AI’s terms, of course.

“Accountability and safety of large AI models is of paramount importance,” Hasani added. “Liquid AI offers more capital efficient, reliable, explainable and capable machine learning models for both domain-specific and generative AI applications.”

Liquid AI, which has a presence in Palo Alto in addition to Boston, has a 12-person team. Hasani expects that number to grow to 20 by early next year.

9 Best Small Language Models Released in 2023

In an era of language models, small language models (SLMs) represent a pivotal advancement in natural language processing, offering a compact yet powerful solution to various linguistic tasks. Most companies are into developing SLMs for their accessibility, computational efficiency, and adaptability, making them ideal for deployment in edge devices and cloud environments, fostering a new era of natural and intuitive human-computer interaction.

Satya Nadella, the CEO of Microsoft, shared at Ignite that “Microsoft loves SLMs,” which is quite a kickstart for the other SLMs.

Here is a list of small language models that were introduced in 2023.

Llama 2 7B

Llama 2, Meta AI’s second-generation open-source large language model, released in July, has an impressive 34 billion parameters, and the smaller 7 billion model was made specially for research purposes. It significantly enhances the model’s performance, efficiency, and accessibility compared to its predecessor.

With demonstrated text generation, translation, and code generation improvements, Llama 2 caters to a wide array of NLP tasks. The model’s multilingual capabilities and availability of fine-tuned versions for specific tasks, such as Code Llama, broaden its applications, from machine translation to chatbots and content creation.

Many of the current open-source models are built on top of the Llama family of models.

Phi2 and Orca

At Ignite 2023, Microsoft announced its latest innovations in small language models, introducing Phi-2 and Orca. Phi-2, the newest iteration in the Phi Small Language Model (SLM) series, boasts an impressive 13-billion-parameter capacity and is tailored for enhanced efficiency and scalability.

Phi-2, tailored for edge devices and the cloud, excels in text generation, language translation, and informative question-answering. Trained on GPT-4 signals, Orca stands out in reasoning tasks, offering clear explanations. Phi-2 and Orca are a step towards epitomising Microsoft’s commitment to advancing small language models, promising a revolution in natural and accessible computing.

Stable Beluga 7B

A 7 billion parameter language model, leveraging the Llama model foundation from Meta AI and fine-tuned on an Orca-style dataset, exhibits robust performance across various NLP tasks, including text generation, translation, question answering, and code completion. Stable Beluga 7B understands and responds in multiple languages, enhancing its global reach and applicability. The model’s future promises further performance enhancements, increased adoption and integration, the development of specialized versions, and continued contributions to the open-source community.

X Gen

X Gen, a 7 billion-parameter small language model (SLM) pioneered by Salesforce AI, primarily focuses on dialogue and diverse tasks such as text generation, translation, and code completion. With a compact size of 7 billion parameters, X Gen offers computational efficiency, facilitating broader deployment.

Boasting multilingual capabilities and continuous development efforts by Salesforce AI, X Gen emerges as a valuable tool with applications ranging from creative writing and content creation to software development and language learning.

Alibaba’s Qwen

Alibaba has recently released its Qwen series, which stands out as a formidable family of language models. With various models differing in parameter sizes and functionalities, the series caters to diverse applications such as text generation, translation, question answering, vision and language tasks, and audio processing. The key features of the models include high performance, multilingual support, and open-source availability, making them accessible for researchers and developers. Alibaba’s Qwem series includes Qwen, the core language models, namely Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B.

Alpaca 7B

Alpaca 7B, a finely tuned replication of Meta’s 7 billion-parameter LLaMA model, is renowned for its remarkable compactness and cost-effectiveness, requiring less than $600 in building costs. Despite its small size, Alpaca 7B has demonstrated noteworthy performance, rivalling that of larger models in certain tasks.

This affordability and efficiency make Alpaca 7B an accessible option for various applications, showcasing the potential for impactful advancements in natural language processing within a budget-friendly framework.

MPT

A 7-billion-parameter small language model (SLM) by Mosaic ML stands at the intersection of code generation and creative text formats, delivering specialised functionalities for programmers and artists alike. Designed to enhance productivity, MPT excels in generating precise code snippets, automating tasks, and inspiring artistic expression through various creative text formats.

Its potential applications span software development, creative writing, content creation, education, and accessibility tools, showcasing MPT’s adaptability and promise in contributing to both technical and creative domains.

Falcon 7B

Falcon 7B, crafted by the Technology Innovation Institute (TII) from the UAE, represents a standout addition to the Falcon series of autoregressive language models, celebrated for their outstanding performance. Tailored for efficiency in straightforward tasks such as chatting and question answering, the 7 billion-parameter model is optimised to handle a vast corpus of text data, encompassing approximately a trillion tokens.

The Falcon models have been on the top of the Hugging Face leaderboard for the longest time since they were released, and the open-source community has worked with them.

Zephyr

Crafted by Hugging Face, Zephyr is a 7 billion-parameter small language model (SLM), emerging as a powerhouse for engaging dialogues. It is designed as a fine-tuned version of the Megatron-Turing NLG model and inherits robust capabilities for generating natural and captivating language.

Focusing on dialogue interactions proves ideal for chatbots, virtual assistants, and various interactive applications. Its compact size ensures computational efficiency, making it deployable across diverse platforms. Zephyr’s training on a diverse dataset enables it to understand and respond in multiple languages, amplifying its global applicability.

The post 9 Best Small Language Models Released in 2023 appeared first on Analytics India Magazine.

Sarvam AI raises $41 million to Train Indic LLMs

Bangalore-based startup Sarvam AI has raised USD 41 million in a Series AI funding round led by Lightspeed and supported by Peak XV Partners and Khosla Ventures.

Founded by Vivek Raghavan and Pratyush Kumar, Sarvam AI will focus on India’s unique needs. This includes training AI models to support the diverse Indian languages and voice-first interfaces. The company will also work with Indian enterprises to co-build domain-specific AI models on their data.

Finally, the company aims to create population-scale impact by layering generative AI on top of the highly successful India stack specifically for public-good applications.

Sarvam AI’s ambitious plan is to develop the “full-stack” for Generative AI, ranging from research-led innovations in training custom AI models to an enterprise-grade platform for authoring and deployment.

The company believes that this full-stack approach will accelerate the adoption of generative AI in India, especially given that enterprises see the potential of generative AI but are grappling with how to leverage it for their business.

“I have seen first-hand the enormous value in innovating at foundational layers and deploying at population scale. India has demonstrated that it can harness technology differently, and with generative AI we have an opportunity to reimagine how this technology can add value to people’s lives,”, Vivek Raghavan, co-founder of Sarvam AI said.

The post Sarvam AI raises $41 million to Train Indic LLMs appeared first on Analytics India Magazine.

Pimento turns creative briefs into visual mood boards using generative AI

Pimento turns creative briefs into visual mood boards using generative AI Romain Dillet @romaindillet / 9 hours

Pimento is a new French startup that is using generative AI in an interesting way as the company focuses on the first step of creative processes — ideation, brainstorming and moodboarding. And the company recently raised a $3.2 million (€3 million) funding round from an interesting list of investors.

The best way to describe Pimento is by talking about people who could use a tool like this. Creative teams working on a brand redesign, an ad campaign, an upcoming video game, an animation movie will open Pimento on the first day of their new projects. It’s the tool that you use to start the research process.

These users want to compile a reference document with images, text and colors that will be used down the road for these projects. They will serve as the main inspiration and the first guidelines of the project for other teams working on it.

And there’s a lot of back and forth during this phase because clients or managers can be picky and often change their mind. They give you a direction for the next meeting in two weeks, but then they don’t like what the creative team brings to the table so creative workers have to start from scratch again.

Right now, many creative workers rely on Pinterest, Instagram, Behance, Canva and Figma to find images around the web and create mood boards. And in many ways, your creativity is defined by the tools that you use. The Pimento team hopes that it can foster your creativity thanks to artificial intelligence.

“It allows you to explore more directions more quickly, so that you can produce projects of higher quality,” co-founder Tomás Yany told me. AI models “bring a wealth of knowledge that no designer will ever have. They’ve seen a lot of things, because they were originally trained with data from Japan and Latin America,” he added later in the conversation as an example.

Pimento’s seed round was led by Partech and Cygni Capital. Several angel investors also participated in the round, such as Julien Chaumond (Hugging Face), Stanislas Polu (Dust), Thibaud Elzière (Hexa), Jean-Charles Samuelian (Alan), Igor Manceau (former creative director at Ubisoft), Jonathan Widawski (Maze), Alessandro Sabatelli (ex-Apple) and Nicolas Steegmann (ex-Stupeflix).

Image Credits: Pimento

So how do you use the tool exactly? Pimento’s co-founders showed me a demo of the product. You first start by typing some instructions of what you’re looking to achieve with your project, a sort of text brief. You then add a handful of images that will serve as the basis of your project.

After that, Pimento uses your instructions with AI models to help you come up with images, text and colors. There are three buttons on the screen that you can use whenever you want to generate images, text or colors.

If some of Pimento’s propositions seem interesting, you can save them for later. When you’re done, you can generate a link and share a board with all the images, colors and text that you have saved.

What makes Pimento different from using an off-the-shelf image-generating AI model is that everything you generate in Pimento is tailored to your initial brief — it becomes a sort of creative companion. In addition to that, you can also iterate on Pimento’s output.

For instance, you can select two images that you like and merge them to iterate one step further. You can pick a color from an image. You can reuse text to generate more images. You can ask for more image variants.

“There’s a debate on how you interact with these AI models. I don’t think the future is a chat interface where you enter prompt,” co-founder Florent Facq told me. It doesn’t mean that there’s no prompting in Pimento. But the company plans to offer several ways to interact with your content.

In the future, the company plans to add more features, such as the ability to customize the board that you’re going to share with the team or the client. Right now, the company uses fine-tuned open-source models, such as Stable Diffusion, Llama and soon Mistral AI.

It’s clear that there’s a long roadmap of new features that could be interesting for a product like Pimento. And the recent funding round will definitely help when it comes to product development. It’s also going to be interesting to see how companies start using the product.

Image Credits: Pimento

Top 10 Research Papers Published by Google in 2023

The year 2023 has witnessed some groundbreaking research, shaping the future of AI technology. Google, which has been at the forefront of the AI revolution, has announced AI models with multiple capabilities. Along with the launch of innovative products, it has also released various research papers, offering a glimpse into the underlying technology.

Most recently, Google has released its latest generative AI multimodal model called Gemini, that competes directly with GPT-4, and is already in discussions on social media. But this is not the best paper that Google has published this year.

Here is the list of top 10 research papers published by Google in 2023.

Gemini: A Family of Highly Capable Multimodal Models

Topping the list is obviously Gemini, the paper behind the competitor multimodal model to OpenAI’s GPT-4. Recently introduced, Gemini as a highly capable system jointly trained on image, audio, video, and text data. The primary goal is to create a model with robust generalist capabilities across modalities, coupled with state-of-the-art understanding and reasoning performance within each domain.

Gemini 1.0, the inaugural version, is available in three sizes: Ultra for intricate tasks, Pro for scalable performance and deployment, and Nano for on-device applications. Each size is meticulously designed to cater to distinct computational limitations and application needs. Comprehensive evaluations of Gemini models encompass a diverse array of internal and external benchmarks, spanning language, coding, reasoning, and multimodal tasks.

PaLM-2

PaLM-2 was the groundbreaking language model surpassing its predecessor, PaLM, boasting enhanced multilingual and reasoning capabilities while being more computationally efficient. Leveraging a Transformer-based architecture and a diverse set of training objectives, PaLM 2 demonstrates significantly improved performance on various downstream tasks, ensuring superior quality across different model sizes.

Notably, PaLM 2 exhibits accelerated and resource-efficient inference, facilitating broader deployment and faster response times for more natural interactions. Its robust reasoning capabilities are highlighted by substantial advancements over PaLM in tasks such as BIG-Bench.

PaLM-E: An Embodied Multimodal Language Model

PaLM-E represents a significant leap forward in the development of AI agents capable of interacting with the physical world. This paper describes LLMs equipped with a virtual embodiment, allowing it to perceive and manipulate its surroundings through sensors and actuators.

PaLM-E’s capabilities extend beyond simply understanding and generating text. It can navigate through a simulated environment, manipulate objects, and engage in simple conversations. This embodiment allows PaLM-E to learn and adapt to its environment in a more nuanced and realistic way compared to traditional LLMs.

The potential applications of PaLM-E are vast and diverse. It could be used to develop more realistic and engaging virtual assistants, robots that can assist with tasks in the real world, and even educational tools that allow users to learn through interactive simulations.

MusicLM: Generating Music from Text

Google was also into making music this year. MusicLM revolutionises music creation by enabling the generation of high-quality music from simple text descriptions. This paper introduces a system capable of composing music in various styles and genres based on user input, opening up new possibilities for musicians, composers, and anyone interested in exploring musical creativity.

MusicLM’s capabilities are based on a neural network trained on a massive dataset of music and text pairs. This allows the system to learn the complex relationships between text and musical elements, enabling it to generate music that is both faithful to the user’s description and musically sound.

Structure and Content-Guided Video Synthesis with Diffusion Models

This paper introduces a novel method for synthesising realistic videos using diffusion models. This approach allows for greater control over the content and structure of the generated videos, making it a valuable tool for video editing and animation.

Traditional video synthesis methods often lacked the ability to accurately control the details and structure of the generated videos. Diffusion models address this limitation by providing a framework for gradually introducing noise into a video and then denoising it to achieve the desired result. This allows for fine-grained control over the entire video generation process.

Lion: EvoLved Sign Momentum for Training Neural Networks

Lion introduces a new and efficient optimisation algorithm for training neural networks. This algorithm significantly improves the speed and accuracy of training, leading to better performance for various AI applications.

Traditional optimization algorithms used in training neural networks can be slow and inefficient. Lion addresses this issue by utilising a novel approach that analyses the dynamics of the training process and adapts accordingly. This allows Lion to optimise the learning process in a more effective way, leading to faster convergence and improved generalisation.

InstructPix2Pix: Learning to Follow Image Editing Instructions

This paper proposes a groundbreaking method for editing images based on text instructions. InstructPix2Pix enables users to modify images in a natural and intuitive way, opening up new possibilities for image editing and manipulation.

Traditional image editing tools require users to have specific technical skills and knowledge. InstructPix2Pix removes this barrier by allowing users to edit images simply by providing textual instructions. This user-friendly approach makes image editing accessible to a wider audience and simplifies the process for experienced users.

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Large text-to-image models have limitations in mimicking subjects from a reference set and generating diverse renditions. To address this, Google Research and Boston University present a personalised approach. By fine-tuning the model with a few subject images, it learns to associate a unique identifier with the subject, enabling the synthesis of photorealistic images in different contexts.

The technique preserves key features while exploring tasks like recontextualization, view synthesis, and artistic rendering. A new dataset and evaluation protocol are provided for a subject-driven generation. Check out their GitHub repository here.

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

The paper presents REVEAL, an end-to-end Retrieval-Augmented Visual Language Model. REVEAL encodes world knowledge into a large-scale memory and retrieves from it to answer knowledge-intensive queries. It consists of a memory, encoder, retriever, and generator. The memory encodes various multimodal knowledge sources, and the retriever finds relevant entries.

The generator combines retrieved knowledge with input queries to generate outputs. REVEAL achieves state-of-the-art performance in visual question answering and image captioning, utilising diverse multimodal knowledge sources. The paper is submitted by members from the University of California, Los Angeles and Google Research.

On Distillation of Guided Diffusion Models

Classifier-free guided diffusion models, widely used in image generation, suffer from computational inefficiency. Google, Stability AI and LMU Munich propose distilling these models into faster sampling models. The distilled model matches the output of combined conditional and unconditional models, achieving comparable image quality with fewer sampling steps.

The approach is up to 256 times faster for pixel-space models and at least 10 times faster for latent-space models. It also proves effective in text-guided image editing and inpainting, requiring only 2-4 denoising steps for high-quality results.

The post Top 10 Research Papers Published by Google in 2023 appeared first on Analytics India Magazine.

Lisa Su Takes AMD’s AI Beyond Just GPUs

“Today it’s all about AI,” said Lisa Su, the CEO of AMD, at the Advancing AI event. “AI is not just a cool new thing, it is actually the future of computing…the only thing close is maybe just the introduction of the internet, but what’s different about AI is that the adoption rate is much much faster.”

She highlighted how AMD is positioned in the perfect spot to power the entire chain of the AI era. “Thinking about massive cloud server installations on-prem enterprise clusters to the next generation of AI embedded on PCs,” Su explains about how the strategy of AMD is focused on developing compute engines, open software capabilities, and fostering an AI ecosystem with deep co-innovation.

“The capability and availability of GPUs is the single most important driver of AI adoption,” said Su, to which the crowd agreed. For this, AMD has released Instinct MI300X accelerators, boasting an industry leading bandwidth of generative AI, along with Instinct MI300A accelerated processing unit (APU), combined with the latest AMD CDNA 3 architecture and Zen 4 CPUs – all focused for HPC and AI workloads.

All about collaboration

That apart, Su highlighted how these GPUs wouldn’t be valuable if there was not an ecosystem that could utilise this, and AMD has just built that.

AMD believes that AI is a collaborative frontier, and not just a competition. The keynote by Su featured Microsoft’s CTO Kevin Scott, Oracle’s senior vice president Karan Batta, and Meta AI senior director engineering Ajit Matthews, and founders of AMD customers such as Lamini, Databricks, and Essential AI.

For instance, Scott highlights that Microsoft has been working with AMD for Epic, Xbox, and a lot of AI computers all this while. “The thing that allowed Microsoft and OpenAI to do this [ChatGPT] was the amount of infrastructure work that we have been invested in all this while,” highlighting how AMD has been a constant contributor to the success of the Microsoft-OpenAI partnership.

*Lisa Su with Microsoft CTO Kevin Scott*

Su highlighted how Microsoft has been key in making AMD advance in its AI journey. “We are super excited about the MI300X, and at Ignite we announced that MI300X VM’s would be available on Azure,” said Scott. Bringing up GPT-4 and Llama 2 on MI300X and seeing the performance, and rolling it into production is something Scott said that he has been waiting for eagerly.

Matthews from Meta AI also highlighted that Meta is going to include AMD MI300X for building its data centres for AI inference workloads. He said MI300X is trained to be the fastest design-to-deployment solution in Meta’s history.

Batta comes to the stage and highlights how OCI has been the leading customer of AMD. Now, Oracle is going to support MI300X as a bare metal stack on its server for giving its customers the option of using AMD GPUs for training and inference. “Customers are already seeing incredible results with the previous generation of GPUs, and the next generation is going to make it even better.”

*Lisa Su with Oracle senior vice president Karan Batta*

*Ion Stoica, co-founder of Databricks;* *Ashish Vaswani, co-founder of Essential AI; and* *Sharon Zhou, co-founder of Lamini*

Winning Race via Open Source

To make MI300X easier to adopt in the industry, AMD has built the Instinct platform based on the industry standard OCP compliant design. It means that the board on which the MI300X is built upon can be integrated directly into any other OCP compliant designed platform. “You can take out your other board, and put in the MI300X Instinct platform,” Su highlighted, comparing it with the NVIDIA H100 HGX.

The training performance of the MI300X is exactly equal to the NVIDIA H100. But when it comes to inference, MI300X using Bloom 176B and Llama 2 70B offers 1.6X and 1.4X faster performance.

The Instinct platform can train and infer twice as many models as its competitors when running multiple different models with its expanded 2.4X memory. Moreover, it can also run two times larger LLMs on a single platform. “If you don’t have enough GPUs, this is really really helpful,” said Su.

“As important as the hardware is, software is what really drives innovation,” Su added, talking about the new release of ROCm 6.

Victor Peng, president of AMD, highlighted how ROCm software stack has been production-ready since last year, taking examples of Databricks, Lamini, and Essential AI. AMD’s CUDA alternative is open source, and other AI software such as ZenDNN and Vitis AI also support the whole AI ecosystem. “Any model can run seamlessly across the ecosystem of software.”

“We wanted ROCm to be modular and open source for broad user accessibility and rapid contribution from the open source AI community,” Peng said, adding that it is the software strategy highlighting how CUDA is proprietary and closed source. Furthermore, ROCm is now also supported on Radeon GPUs, along with Ryzen 1.0 software, for making AI on edge.

This clearly marks that AMD’s open AI approach is advancing the company’s hardware and software to the whole community, proving that it is more than just building GPUs.

The post Lisa Su Takes AMD’s AI Beyond Just GPUs appeared first on Analytics India Magazine.

Five-month-old Indian AI startup Sarvam scores $41 million funding

Five-month-old Indian AI startup Sarvam scores $41 million funding Manish Singh 9 hours

Sarvam AI has come out of stealth mode and announced it has raised $41 million as the five-month old Indian startup races to build a suite of full-stack generative AI offerings.

The $41 million funding raise is across the Seed and Series A financing rounds. Lightspeed led the Series A round, whereas it co-led the Seed with Peak XV Partners. Peak XV and Khosla Ventures also participated in the Series A funding.

The Bengaluru-headquartered startup is building large language models that support Indian languages, Sarvam AI Vivek Raghavan told TechCrunch. The startup is also creating a platform that will allow businesses to build with LLMs — “everything from writing an app, deploying it to popular channels, observing logs, and custom evaluation,” he said.

Sarvam AI is also focusing on building LLMs that use voice as the default interface in India. This strategy, combined with its emphasis on supporting local languages, aims to cater specifically to the Indian market’s requirements.

“This requires us to change the architecture of existing open models and to train them in custom ways to teach the new language. The advantage is that the resultant models are more efficient (in terms of tokens consumed) for understanding and generating Indian language than any of the existing LLMs,” said Raghavan.

Sarvam was founded by Raghavan and Pratyush Kumar, both of whom previously worked at tech veteran Nandan Nilekani-backed AI4Bharat of IIT Madras, about five months ago. Raghavan additionally spent more than a decade at UIDAI, the entity overseeing the omnipresent Indian identity system Aadhaar.

“I have seen first hand the enormous value in innovating at foundational layers and deploying at population scale,” he said. “India has demonstrated that it can harness technology differently, and with GenAI we have an opportunity to reimagine how this technology can add value to people’s lives.”

The startup plans to make its first model public over the coming weeks.

The investment in Sarvam comes at a time when investors globally are rushing to identify and back AI breakthrough, banking on the thesis that advances in AI will make countless industries more efficient and startups at the forefront will deliver generational returns.

Despite being home to one of the world’s largest startup ecosystems, India has yet to make a material impact in the rapidly advancing AI arena. No homegrown Indian contenders have emerged to challenge the dominance of large language model titans such as OpenAI’s ChatGPT, Amazon–backed Anthropic, or Google’s Bard. (Indian powerhouse Reliance partnered with Nvidia in September, revealing plans to build a large language model that is trained on India’s diverse languages.)

“We see several countries having sovereign efforts to build GenAI models given its strategic importance. We need companies like Sarvam AI to develop deep expertise for building AI in and for India,” Khosla Ventures founder Vinod Khosla said in a statement. Khosla Ventures was the first institutional investor in OpenAI, turning a $50 million investment into a $5 billion return.

Hemant Mohapatra, Partner at Lightspeed, said Sarvam AI has a “unique approach” to combine model innovation and application development to build “population-scale” solutions for India. He added, “Lightspeed will be close partners and contribute with our deep capital stack and learnings from our global platform.”

Peak XV and Lightspeed India took less than a week to invest in Sarvam AI’s seed funding round earlier, according to a source with direct knowledge of the event.

“The Sarvam AI team led by Vivek and Pratyush is among the highest calibre AI teams we have seen emerge from India,” said Harshjit Sethi, Managing Director of Peak XV, in a statement.

“Vivek’s expertise in building large scale systems with Pratyush’s domain expertise in AI makes them uniquely positioned to build population scale AI applications. Large scale adoption of AI in India will require not only building uniquely Indian use cases but also delivering them at prices that everyone can afford and we believe the Sarvam team is best positioned to accomplish this.”

What is Meta doing with 150K GPUs?

In Silicon Valley, the talk of the town in 2023 has been about who acquired how many GPUs, given the shortage of NVIDIA H100s.

According to a recent report from Omdia Research, both Meta and Microsoft have received 150,000 H100 GPUs from NVIDIA, surpassing allocations to Google, Amazon, and Oracle.

The quantity is notably substantial for Meta, the parent company of Facebook and Instagram, especially considering it’s not a cloud service provider like Microsoft, but rather the creator of the open-source Llama models.

From Microsoft’s standpoint, it’s evident that they are supplying GPUs to support OpenAI in developing their upcoming models. Unverified leaks suggest that GPT-4 underwent training using approximately 25,000 NVIDIA A100 GPUs over a period of 90 to 100 days.

In addition to running its social media products, such as Instagram and Facebook, and running ads on them, Meta has been at the forefront of various AI research and open-source developments. This definitely hints that the company has been using its GPUs for good.

According to a blog post, Meta aimed to acquire 25,000 or more GPUs, potentially up to 100,000 in 2023.

A Year of Meta Projects

Since the beginning of the year, Meta has been consistently rolling out new products, prompting the need for more GPUs. Moreover, Meta recently celebrated 10 years of FAIR Fundamental AI Research, its AI lab. While others are building mass products in B2C space, Meta is entirely focused on the developer ecosystem with its slew of open source models.

Source: Meta

Starting in February, Meta released LLaMA, followed by Toolformer—a model trained to determine the optimal APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token predictions.

In May, Meta introduced ImageBind, an AI model capable of binding data from six modalities at once, including images and video, audio, text, depth, thermal and inertial measurement units (IMUs).

In July, it introduced the multimodal model ‘CM3leon’ (pronounced like ‘chameleon’), capable of both text-to-image and image-to-text generation. During the same month, Meta unveiled LLaMA’s successor, Llama 2.

In August, Meta announced that DINOv2, its computer vision model trained through self-supervised learning to produce universal features, is now available under the Apache 2.0 license.

In September, Meta hosted Meta Connect 2023, integrating AI features across its social media apps, including WhatsApp, Instagram, and Messenger. The company introduced AI stickers, powered by Llama 2 and Emu, for real-time image generation, seamlessly converting text into diverse, high-quality stickers.

Furthermore, as part of its ongoing commitment to advancing communication technologies, Meta released Seamless Communication. This is a family of AI-powered language translation models designed to foster more natural and real-time communication across different languages. Comprising three components—SeamlessM4T v2, SeamlessStreaming, and SeamlessExpressive—this innovative release showcases Meta’s dedication to pushing the boundaries of AI applications.

Simultaneously, Meta introduced AudioCraft, an open-source generative AI framework which enables users to generate high-quality, realistic audio and music from simple text prompts. It is a single codebase for all generative audio needs, including music, sound effects, and audio compression.

Most recently, Meta introduced Emu Video, which leverages the Emu model for text-to-video generation based on diffusion models. This unified architecture for video generation tasks can respond to a variety of inputs: text only, image only, and both text and image.”

Llama 3 Won’t be on Diet

Now that we know Meta has lots of GPUs and where they used them, it won’t be hard for them to make Llama 3 soon. As Meta keeps using Llama 2 in their products, they’ve gotten really good at getting the most out of it.

Rumors are swirling that Meta has now turned its attention to Llama 3, aiming to enhance it even beyond the capabilities of GPT-4. According to several speculations, Meta will launch Llama 3 early next year. The best part about it is that it is going to be open source for research and commercial use as well.

However, in a podcast with Lex Fridman, Meta’s Chief Mark Zuckerberg expressed Meta’s contemplation of open-sourcing Llama 3. Additionally, he shared his intention to integrate Llama 3 into Meta products, potentially leading to an increased demand for GPUs.

The post What is Meta doing with 150K GPUs? appeared first on Analytics India Magazine.

Top 6 AI Investors of 2023

The current startup landscape has witnessed a notable increase in the number of companies leveraging artificial intelligence (AI) technology. Specifically, out of the ten unicorns that have emerged this year, six are AI-based startups. This trend highlights the growing significance of AI in the business world. Out of the 10 unicorns established this year, 6 were AI startups. This trend has caused a significant stir in the industry, with existing AI startups also experiencing a notable surge in valuation.

Furthermore, existing AI startups have experienced a surge in valuation, indicating the potential of AI technology to drive business growth and profitability. These developments underscore the importance of AI in the current business environment and suggest that it will continue to play a critical role in shaping the industry’s future.

In this collaboration, there are top 14 corporate companies Nvidia/NVentures, Google/GV, SVA, Salesforce Ventures, Coatue, a capital, Microsoft/12, Y Combinator, Samsung NEXT, Madrona, Lux Capital, General Catalyst, b2venture, Andreessen Horowitz have invested in multiple AI startups and the top individuals such as

Nat Friedman, Howie Liu, and Amjad Masad also have invested their money in multiple AI startups.

Top 6 Company ventures in AI unicorns, and their fundings:

Nvidia/NVentures

Nvidia, a company that recently achieved a market capitalization of $1 trillion, has invested in Databricks’ Series I funding round, which raised over $500 million. This funding round has valued Databricks at $43 billion. Nvidia has emerged as a leading investor in AI-related startups, having participated in 11 funding deals in the current third quarter and eight in Q2, according to data from Crunchbase.

Nvidia made investments in companies such as Enfabrica, Imbue (injecting $200 million), AI21 Labs (with a $155 million investment), and $200 million in Hugging Face. In 2023, Adept AI secured $350 million, Cohere raised $270 million, and Skydio received $230 million in funding.

Google/GV

In 2023, Google/GV invested $2 billion in AI startup Anthropic, adding to its $550 million funding from earlier this year. GV is a venture capital firm that supports innovative founders, with Alphabet as its sole limited partner. GV’s operating partners help startups with design, equity, diversity & inclusion, talent, and engineering. GV also provides startups with unique access to Google’s technology and talent. Additionally, Google/GV has made significant investments in AI-based startups such as AI21 Labs, Runway, Synthesia, and Typeface.

AI21 Labs raised $208M in a Series C funding round, Synthesia raised $90M, Runway raised $141M, and Typeface raised $65M in funding. These AI-based companies plan to use the funds to expand their teams, invest in AI research, and develop new technologies for content creation, video generation, and film editing.

SV Angel

SVA has been helping build some of the most transformative companies in AI globally. That success is a result of their belief in unlocking the potential in everyone they work with. In 2023 SVA announced investment in Adept AI, Character AI, and Replit, these companies raised total funding of more than $2.6M over 2 rounds.

Microsoft/M12

In 2019, Microsoft made a significant commitment to OpenAI with its first billion-dollar investment. This was followed by a $10 billion investment in January 2023, after the launch of GPT 3.5. In total, Microsoft has committed $13 billion to OpenAI.

Microsoft has made other significant investments in the AI space, including Inflection AI, Adept AI, Builder.ai and Typeface AI in 2023. The software giant’s stock price has risen more than 50% this year, largely due to the value AI brings to the tech ecosystem.

According to Microsoft CFO Amy Hood, the company’s partnership with OpenAI and others in this sector will add $10 billion in revenue to Microsoft. However, she did not specify the time frame for this.

Samsung NEXT

MosaicML, an LLM infrastructure provider, has recently been acquired for a whopping $1.5 billion. Before the acquisition, the company had raised almost $64 million from investors such as DCVC, AME Cloud Ventures, Lux, Frontline, Atlas, Playground Global, and Samsung Next. Interestingly, the company’s last valuation during its investor round was only $222 million, which means that it has increased by six times with this acquisition. This price highlights the current frothy state of the AI market and the high demand for talent and technology in this field.

AI21 Lab has recently secured $155 million in Series-C funding from investors including Samsung Next, Google, and NVIDIA. With this new funding round, the company has raised a total of $283 million at a valuation of $1.4 billion, further solidifying AI21 Labs’ position as a leader among generative AI unicorns.

Over the next three years, Samsung plans to invest more than $7 billion annually in the research and development of 5G, AI, and the Internet of Things (IoT). The Group is also doing so to encourage other companies to take action.

Lux Capital

Lux Capital, a venture capital firm with offices in New York City and Menlo Park, California, has raised $1.15 billion to invest in startups that focus on science and deep technologies such as AI, robotics, and biotechnology. The firm’s largest fund to date, Lux Ventures VIII or Lux 8, will finance investments in early-stage companies. With this raise, the firm’s total assets under management now exceed $5 billion.

MosaicML and Runway Two are the major investments that have been made by Lux Capital. Runway AI, an AI-powered editing software provider, has successfully raised a total of $46 million in funding. MosaicML has successfully secured $37 million in funding, propelling the company’s valuation to an impressive $1.3 billion.

The post Top 6 AI Investors of 2023 appeared first on Analytics India Magazine.