Meta’s AI image generator says language may be all you need

meta-2023-cm3leon-image-examples.png

Using a fraction of the GPU compute, Meta's CM3Leon achieves images with complex combinations of objects, and hard-to-render things like hands and text, and at a level that achieves a new state of the art on the benchmark FID score.

For the past several years, the world has been wowed by artificial intelligence programs that generate images when you type a phrase, programs such as Stable Diffusion and DALL*E that will output images in any style you want and that can be subtly varied by using different prompted phrases.

Typically, those programs have relied on manipulating example images by performing a process of compression on the example images, and then de-compressing them to recover the original, whereby they learn the rules of image creation, a process referred to as diffusion.

Also: Generative AI: Just don't call it an 'artist' say scholars in Science magazine

Work by Meta introduced this past week suggests something far simpler: an image can be treated as merely a set of codes like words, and can be handled much the way ChatGPT manipulates lines of text.

It might be the case that language is all you need in AI.

The result is a program that can handle complex subjects with multiple elements ("A teddy bear wearing a motorcycle helmet and cape is riding a motorcycle in Rio de Janeiro with Dois Irmãos in the background.") It can render difficult objects such as hands and text, stuff that tends to end up distorted in many image-generation programs. It can perform other tasks, like describing in detail a given image, or altering a given image with precision. And it can be done with a fraction of the computing power usually needed.

In the paper "Scaling Autoregressive Multi-Modal Models: Pre-training and Instruction Tuning," by Lilu Yu and colleagues at Facebook AI Research (FAIR), posted on Meta's AI research site, the key insight is to use images as if they were words. Or, rather, text and image function together as continuous sentences using a "codebook" to replace the images with tokens.

"Our approach extends the scope of autoregressive models, demonstrating their potential to compete with and exceed diffusion models in terms of cost-effectiveness and performance," write Yu and team.

Also: This new technology could blow away GPT-4 and everything like it

The idea of a codebook goes back to work from 2021 by Patrick Esser and colleagues at Heidelberg University. They adapted a long-standing kind of neural network, called a convolutional neural network (or CNN), which is expert at handling image files. By training an AI program called a generative adversarial network, or GAN, which can fabricate images, the CNN was made to associate aspects of an image, such as edges, with entries in a codebook."

Those indices can then be predicted the way words in a language model such as ChatGPT predicts the next word. High-resolution images become sequences of index predictions rather than pixel prediction, which is a far less compute-intense operation.

CM3Leon's input is a string of tokens, where images are reduced to just another token in text form, a reference to a codebook entry.

Using the codebook approach, Meta's Yu and colleagues assembled what's called CM3Leon, pronounced "chameleon," a neural net that is a large language model able to handle an image codebook.

CM3Leon builds on a prior program that was introduced last year by FAIR — CM3, for "Causally-Masked Multimodal Modeling." It's like ChatGPT in that it is a "Transformer"-style program, trained to predict the next element in a sequence — a "decoder-only transformer architecture" — but it combines that with "masking" parts of what's typed, similar to Google's BERT program, so that it can also gain context from what might come later in a sentence.

CM3Leon builds on CM3 by adding to it what's called retrieval. Retrieval, which is becoming increasingly important in large language models, means the program can "phone home," if you will, to reach into a database of documents and retrieve what may be relevant as the output of the program. It's a way to have access to memory so that the neural net's weights, or parameters, don't have to bear the burden of carrying all the information necessary to make predictions.

Also: Microsoft, TikTok give generative AI a sort of memory

According to Yu and team, their database is a vector "data bank" that can be searched for both image and text documents: "We split the multi-modal document into a text part and an image part, encode them separately using off-the-shelf frozen CLIP text and image encoders, and then average the two as a vector representation of the document."

In a novel twist, the researchers use as the training dataset not internet images but a collection of 7 million licensed images from Shutterstock, the stock photography company. "As a result, we can avoid concerns related to image ownership and attribution, without sacrificing performance."

The Shutterstock images retrieved from the database are used in the pre-training stage of CM3Leon to develop the capabilities of the program. It's the same way ChatGPT and other large language models are pre-trained. But, an extra stage then takes place whereby the input and output of the pre-trained CM3Leon are both fed back into the model to further refine it, an approach called "supervised fine-tuning," or SFT.

Also: The best AI art generators: DALL-E 2 and other fun alternatives to try

The result of all this is a program that achieves the state of the art for a variety of text-image tasks. Their primary test is Microsoft COCO Captions, a dataset published in 2015 by Xinlie Chen of Carnegie Mellon University and colleagues. A program is judged by how well it replicates images in the dataset, according to what's called an FID score, a resemblance measure that was introduced in 2018 by Martin Heusel and colleagues at Johannes Kepler University Linz in Austria.

Write Yu and team: "The CM3Leon-7B model sets a new state-of-the-art FID score of 4.88, while only using a fraction of the training data and compute of other models such as PARTI." The "7B" part refers to the CM3Leon program having 7 billion neural parameters, a common measure of the scale of the program.

A table shows how the CM3Leon model gets a better FID score (lower is better) with far less training data, and with fewer parameters than other models, which is the same as saying less compute intensity:

One chart shows how the CM3Leon reaches that superior FID score using fewer training hours on Nvidia A100 GPUs:

What's the big picture? CM3Leon, using a single prompted phrase, can not only generate images but can also identify objects in a given image, or generate captions from a given image, or do any number of other things juggling text and image. It's clear that the wildly popular practice of typing stuff into a prompt is becoming a new paradigm. The same gesture of typing can be broadly employed for many tasks with lots of "modalities," meaning, different kinds of data — image, sound, audio, etc.

Also: This new AI tool transforms your doodles into high-quality images

As the authors conclude, "Our results support the value of autoregressive models for a broad range of text and image tasks, encouraging further exploration for this approach."

Unstructured, which offers tools to prep enterprise data for LLMs, raises $25M

Unstructured, which offers tools to prep enterprise data for LLMs, raises $25M Kyle Wiggers 7 hours

Large language models (LLMs) such as OpenAI’s GPT-4 are the building blocks for an increasing number of AI applications. But some enterprises have been reluctant to adopt them, owing to their inability to access first-party and proprietary data.

It’s not an easy problem to solve, necessarily — considering that sort of data tends to sit behind firewalls and comes in formats that can’t be tapped by LLMs. But a relatively new startup, Unstructured.io, is trying to remove the roadblocks with a platform that extracts and stages enterprise data in a way that LLMs can understand and leverage.

Brian Raymond, Matt Robinson and Crag Wolfe co-founded Unstructured in 2022 after working together at Primer AI, which was focused on building and deploying natural language processing (NLP) solutions for business customers.

“While at Primer, time and again, we encountered a bottleneck ingesting and pre-processing raw customer files containing NLP data (e.g., PDFs, emails, PPTX, XML, etc.) and transforming it into a clean, curated file that’s ready for a machine learning model or pipeline,” Raymond, who serves as Unstructured’s CEO, told TechCrunch in an email interview. “None of the data integration or intelligent document processing companies were helping to solve this problem, so we decided to form a company and tackle it head-on.”

Indeed, data processing and prep tends to be a time-consuming step of any AI development workflow. According to one survey, data scientists spend close to 80% of their time preparing and managing data for analysis. As a result, most of the data companies produce — about two-thirds — goes unused, per another poll.

“Organizations generate vast amounts of unstructured data on a daily basis, which when combined with LLMs can supercharge productivity. The problem is that this data is scattered,” Raymond continued. “The dirty secret in the NLP community is that data scientists today still must build artisanal, one-off data connectors and pre-processing pipelines completely manually. Unstructured [delivers] a comprehensive solution for connecting, transforming and staging natural language data for LLMs.”

Unstructured provides a number of tools to help clean up and transform enterprise data for LLM ingestion, including tools that remove ads and other unwanted objects from web pages, concatenate text, perform optical character recognition on scanned pages and more. The company develops processing pipelines for specific types of PDFs; HTML and Word documents, including for SEC filings; and — of all things — U.S. Army Officer evaluation reports.

To handle documents, Unstructured trained its own “file transformation” NLP model from scratch and assembled a collection of other models to extract text and around 20 discrete elements (e.g., titles, headers and footers) from raw files. Various connectors — about 15 in total — draw in documents from existing data sources, like customer relationship management software.

“Behind the scenes, we’re using a variety of different technologies to abstract away complexity,” Raymond said. “For example, for old PDFs and images, we’re using computer vision models. And for other file types, we’re using clever combinations of NLP models, Python scripts and regular expressions.”

Downstream, Unstructured integrates with providers like LangChain, a framework for creating LLM apps, and vector databases such as Weaviate and MongoDB’s Atlas Vector Search.

Previously, Unstructured’s sole product was an open source suite of these data processing tools. Raymond claims that it’s been downloaded around 700,000 times and used by over 100 companies. But to cover development costs — and placate its investors, no doubt — the company’s launching a commercial API that’ll transform data in 25 different file formats, including PowerPoints and JPGs.

“We’ve been working with government agencies and have several million in revenue in just a very short period. . . . Since our focus is on AI, we’re focused on a sector of the market that’s not affected by the broader economic slowdown,” Raymond said.

Unstructured has unusually close ties to defense agencies, perhaps a product of Raymond’s background. Prior to Primer, he was an active member of the U.S. intelligence community, serving in the Middle East and then in the White House during the Obama administration before a stint at the CIA.

Unstructured was awarded small business contracts by the U.S. Air Force and U.S. Space Force and partnered with U.S. Special Operations Command (SOCOM) to deploy an LLM “in conjunction with mission-relevant data.” Moreover, Unstructured’s board includes Michael Groen, a former general and director of the Pentagon’s Joint Artificial Intelligence Center, and Ryan Lewis, who previously led the Department of Defense’s Defense Innovation Unit.

The defense angle — a reliable early revenue source — might’ve been the deciding factor in Unstructured’s recent financing. Today, the company announced that it raised $25 million across a Series A and previously undisclosed seed funding round. Madrona led the Series A with participation from Bain Capital Ventures, which led the seed, and M12 Ventures, Mango Capital, MongoDB Ventures and Shield Capital, as well as several angel investors.

Capgemini Strengthens Ties With Microsoft, Co Creates Azure Intelligent App Factory

Building on their existing partnership, Capgemini has co-created an Azure Intelligent App Factory, in collaboration with Microsoft, for organisations to scale responsible and sustainable generative AI capabilities. Some of the use cases the duo has identified include creating content and design as well as analysing network traffic patterns for improved cybersecurity.

This new solution will see the two companies combine their technology, including the Microsoft Cloud, GPT4-powered Azure OpenAI Service an Github Copilot. The Azure Intelligent App Factory aims to accelerate AI investments by controlling the security and focusing on handling of and access to data.

Aiman Ezzat, the CEO of Capgemini, said “Both Microsoft and Capgemini are guided by strong ethical principles, which are the cornerstone of the new Azure Intelligent App Factory. By combining the Group’s global expertise in engineering and R&D services, data, and AI, with Microsoft’s market leading technology, we are committed to enabling clients to successfully implement AI solutions.”

The Azure Intelligent App Factory is strategically centred on promoting the implementation of generative AI, enabled by Azure OpenAI Service. This initiative spans across various industries, including consumer products, life sciences, financial services, manufacturing, and telecommunications.

The suite boasts an extensive array of enterprise-ready use cases designed to demonstrate the business value of generative AI. By fine tuning language models with proprietary company data and industry-specific information, it can deliver dependable and scalable outputs, empowering firms to develop solutions specific to their distinct business requirements.

Notably, Capgemini is not the first enterprise level organisation to partner with Microsoft since generative AI has become the buzzword for the tech town. Leading firms Tata Consultancy Services (TCS), HCLTech, Accenture and Moody’s have recently strengthened their ties with the company behind Bing, Microsoft.Fascinatingly, another leading Indian IT giant Infosys, along with a few other parties including Elon Musk, AWS and others donated $1 billion to OpenAI. Two months ago the company announced Topaz — an AI-first set of services, solutions and platforms using generative AI with 12,000 use cases.

The post Capgemini Strengthens Ties With Microsoft, Co Creates Azure Intelligent App Factory appeared first on Analytics India Magazine.

DSC Weekly 18 July 2023

Announcements

  • Data management and analytics have never been more critical to defining long-term success. The Optimal Data Analytics summit explores how AI and ML are shaping the future of data analytics, and discover strategies to implement deep learning, neural networks, RPA, NLP and more. Join us to learn how to unleash the power of augmented analytics to optimize core business processes, source new revenue streams, improve customer satisfaction and drive long-term success.
  • With numerous cyber threats lurking and many available attack vectors, organizations must have a comprehensive view of what they are up against and how to best face possible attacks. Join the Enabling Threat Detection and Response summit to hear from leading experts about the most common, pervasive threats striking companies, the best monitoring and analytics strategies out to there to quell them, and the most effective methods for stopping threats.

Top Stories

  • LLMs: Does human text data make generative AI an entity?
    July 17, 2023
    by David Stephen
    There is a recent interview, The Ethical Puzzle of Sentient AI, where a professor said, “But there’s also the problem that I’ve called the ‘gaming problem’ — that when the system has access to trillions of words of training data, and has been trained with the goal of mimicking human behavior, the sorts of behavior patterns…
  • Next-Gen Data Scientist: Thinking Like an Economist
    July 16, 2023
    by Bill Schmarzo
    Generative AI (GenAI) products like OpenAI ChatGPT, Microsoft Bing, and Google Bard are disrupting the roles of data engineers and data scientists. According to a recent report by McKinsey, these GenAI products could potentially automate up to 40% of the tasks performed by data science teams by 2025. And Emad Mostaque, founder and CEO of…
  • A Detailed Guide for Data Handling Techniques in Data Science
    July 13, 2023
    by Shanthababu Pandian
    Data Engineers and Data Scientists need data for their Day-to-Day job. Of course, It could be for Data Analytics, Data Prediction, Data Mining, Building Machine Learning Models Etc., All these are taken care of by the respective team members and they need to work towards identifying relevant data sources, and associated with…
Education_DSC_160x600-2

In-Depth

  • Leveraging AI for smarter electronic data interchange
    July 18, 2023
    by Ovais Naseem
    Electronic Data Interchange (EDI) can be traced back to the late 1960s and early 1970s when businesses began to seek more efficient ways to exchange data electronically. Consequently, the concept of using computers to transmit and receive business documents emerged, aiming to replace manual paper-based processes. Then in the 1980s, standards organizations such as ANSI…
  • Real-time analytics
    July 17, 2023
    by Markus Buhmann
    The modern enterprise is insight-driven, or, at least, aims to be. Historically, those insights were found in a data warehouse or data lake, populated with scheduled feeds and analysts, working feverishly over them. Feeds had plenty of bandwidth, but high latency. Think an 18-wheeler loaded with hard drives, driving from London to Birmingham. Nowadays, insights…
  • AI ushers in a new era of mental health monitoring
    July 17, 2023
    by Devarati Sarkar
    AI Ushers in a New Era of Mental Health Monitoring Important Data Points: AI’s Role in Mental Healthcare Transformation – It can be safe to say that AI is driving a significant transformation in mental healthcare, promising more accessible, economical, and effective treatments. The Emerging Role of Technology and Artificial Intelligence As the modern world…
  • Data science vs web development: What’s the difference?
    July 17, 2023
    by Gregory Batchelor
    If you’ve spent any time in the tech community in the last few years, you’ll have noticed the recent explosion in interest in both data science and web development. Young people interested in a career in tech are increasingly turning to careers as data scientists or web developers. The importance of web development should be…
  • How Do Companies Use Artificial Intelligence?
    July 14, 2023
    by Yana Ihnatchyck
    By now, AI-based tools have totally changed the way companies operate across all industries. The use of AI in them to streamline operations, make informed decisions, and enhance customer experiences. Companies utilize AI in a multitude of ways, such as automating repetitive tasks, predicting customer behavior, and optimizing supply chain management. Today, we will dive…
  • Data modeling techniques in modern data warehouse
    July 13, 2023
    by Shanthababu Pandian
    Hello, data enthusiast! In this article let’s discuss “Data Modelling” right from the traditional and classical ways and aligning to today’s digital way, especially for analytics and advanced analytics. Yes! Of course, last 40+ years we all worked for OLTP, and followed by we started focusing on OLAP. After cloud ear come into the picture…
  • DSC Weekly 11 July 2023
    July 11, 2023
    by Scott Thompson
    Read more of the top articles from the Data Science Central community.

Llama 2 vs GPT-4 vs Claude-2 

Last night Meta released Llama 2, an upgraded version of its large language model LLaMa, in a surprise partnership with Microsoft. Soon to be available on the Microsoft Azure platform catalogue and Amazon SageMaker, the model can be used for both research and commercial purposes through licensing.

The introduction of the 7B, 13B, and 70B pre-trained and fine-tuned parameter models shows a remarkable 40% increase in pre-trained data, leveraging larger context data for training, and employing GQA (Generalised Question-Answering) to enhance the inference capabilities of the larger model.

Meanwhile, over the past couple of months, several companies have launched their own LLMs including TII’s Falcon, Stanford’s Alpaca and Vicuna-13B, Anthropic’s Claude 2 and more. So before your timeline gets flooded with posts like “ChatGPT is just the tip of the iceberg, Llama is here” or “Meta is Microsoft’s new favourite child”, let’s cut to the chase and see how these models fair.

Grades Matter

Llama 2-Chat was made using fine-tuning and reinforcement learning with human feedback, involving preference data collection and training reward models, including a new technique like Ghost Attention (GAtt). It is also trained on GPT-4 outputs. Meta performed human study to evaluate the helpfulness of Llama-2 using 4,000 prompts. The “win rate” metric was used to compare the models, similar to the Vicuna benchmark. The study compares Llama 2-Chat models to both open-source and closed-source models like ChatGPT and PaLM using single and multi-turn prompts.

The 70B Llama-2 model performs roughly on par with GPT-3.5-0301 and outperforms Falcon, MPT, and Vicuna. Llama 2-Chat models outperform open-source models in terms of helpfulness for both single and multi-turn prompts. It has a win rate of 36% and a tie rate of 31.5% compared to ChatGPT. It also outperforms the MPT-7B-chat model on 60% of the prompts. The Llama 2-Chat 34B model has an overall win rate of over 75% against the equivalently sized Vicuna-33B and Falcon 40B models. Additionally, the 70B model outperforms the PaLM-bison chat model by a significant margin.

However, Llama-2 is weak in coding.

It is not better than GPT-3.5 (48.1) level or GPT-4 (67) when it comes to coding. Although it MMLU (Massive Multitask Language Understanding) benchmark is good, HumanEval shows coding capability is quite a bit lower compared to StarCoder (33.6) or many other models specifically designed for coding. But, considering that Llama-2 has open weights, it is highly likely that it will improve significantly over time.

On the other hand, Claude-2 excels in coding, mathematics, and logical thinking, including the ability to comprehend PDFs—a task that GPT 4 still struggles with. It attained an impressive score of 71.2% on the Codex HumanEval, an evaluation specifically designed to assess Python coding skills.

When it comes to writing, Llama-2 and GPT-4 are very different, too.

When asked to write a poem, both had a different approach. ChatGPT seems to have more intentional word choices which are more focused on the way words sound, a more sophisticated poet with a wider vocabulary. While Llama-2 uses a more obvious rhyming word selection, like a high school poem.

I asked both Llama-2 and GPT-4 to write a poem about their epic competition. Guess which one is which.
========= Poem 1 =========
In the grand tapestry of technology's weave,
Where information turns and ideas cleave,
Two figures stand, their stories interweave,
GPT and Llama-2,…

— Jim Fan (@DrJimFan) July 18, 2023

Even though Llama-2 is trained on a much smaller scale, its output is commendable, as per several users who have beta access. Meta initially used publicly available data but since it’s insufficient, they collected high-quality data, achieving better results with fewer examples and observed the impact of different platforms or vendors on performance and found the model’s outputs comparable to human annotations.

Open Source Or Openness?

Building LLaMA likely cost Meta over USD 20 million. And although it is being touted as open-source, it comes with a condition. Meta is helping the open-source community by releasing the model with a commercially-friendly license.

If Llama 2 reaches over 700 million monthly active users, you need Meta’s permission to continue using the rights granted. Due to the unauthorised online leak of the initial Llama model intended for research, a licensed version now simplifies the usage of Llama 2.

However, other LLM models like GPT-4, and Claude 2 are not open source but can be accessed through APIs.

Microsoft’s Second Child

Microsoft’s new partnership with Meta came as a surprise. After investing in a ten-year partnership with OpenAI, Satya Nadella seems to yearn for more. Meanwhile, Meta’s Threads managed to amass a staggering 10 million registrations within a mere seven hours of its debut, However, ChatGPT saw an unprecedented decline of 9.7% in June, marking the first downturn since its introduction in December.

When OpenAI released the paper of GPT-4, the ChatGPT maker received immense flak for being lame as it lacked crucial details about the architecture, model size, hardware, training compute, dataset construction, and training method. Researchers believe that OpenAI’s approach undermines the principles of disclosure, perpetuates biases, and fails to establish the validity of GPT-4‘s performance on human exams.

On the other hand, Meta’s white paper is itself a masterpiece. It spelt out the entire recipe, including model details, training stages, hardware, data pipeline, and annotation process. For example, there’s a systematic analysis of the effect of RLHF with nice visualisations.

According to Percy Liang, director of Stanford’s Center for Research on Foundation Models, Llama-2 poses a considerable threat to OpenAI. Meta’s research paper admits there is still a large gap in performance between LLaMA 2 and GPT-4. So even though LLaMA 2 can’t compete with GPT-4 on all parameters, it has the potential to make it better. “To have Llama-2 become the leading open-source alternative to OpenAI would be a huge win for Meta,” says Steve Weber, a professor at the University of California, Berkeley.

Thus, with the arrival of Meta’s Llama-2, Microsoft now has a new child to rely upon should its older child fail.

Read more: Claude-2 Vs GPT-4

The post Llama 2 vs GPT-4 vs Claude-2 appeared first on Analytics India Magazine.

Generative AI with Large Language Models: Hands-On Training

Generative AI with Large Language Models: Hands-On Training
Image by Author Introduction

Large language models (LLMs) like GPT-4 are rapidly transforming the world and the field of data science. In just the past few years, capabilities that once seemed like science fiction are now becoming a reality through LLMs.

The Generative AI with Large Language Models: Hands-On Training will introduce you to the deep learning breakthroughs powering this revolution, with a focus on transformer architectures. More importantly, you will directly experience the incredible breadth of capabilities the latest LLMs like GPT-4 can deliver.

You will learn how LLMs are fundamentally changing the game for developing machine learning models and commercially successful data products. You will see firsthand how they can accelerate the creative capacities of data scientists while propelling them toward becoming sophisticated data product managers.

Through hands-on code demonstrations leveraging Hugging Face and PyTorch Lightning, this training will cover the full lifecycle of working with LLMs. From efficient training techniques to optimized deployment in production, you will learn directly applicable skills for unlocking the power of LLMs.

By the end of this action-packed session, you will have both a foundational understanding of LLMs and practical experience leveraging GPT-4.

Generative AI with Large Language Models: Hands-On Training
Image from Training Training Outlines

The training has 4 short modules that introduce you to Large Language Models and teach you to train your own large language model and deploy it to the server. Apart from that, you will learn about the commercial value that comes with LLMs.

1. Introduction to Large Language Models (LLMs)

  • A Brief History of Natural Language Processing
  • Transformers
  • Subword Tokenization
  • Autoregressive vs. Autoencoding Models
  • ELMo, BERT, and T5
  • The GPT (Generative Pre-trained Transformer) Family
  • LLM Application Areas

2. The Breadth of LLM Capabilities

  • LLM Playgrounds
  • Staggering GPT-Family progress
  • Key Updates with GPT-4
  • Calling OpenAI APIs, including GPT-4

3. Training and Deploying LLMs

  • Hardware Acceleration (CPU, GPU, TPU, IPU, AWS chips)
  • The Hugging Face Transformers Library
  • Best Practices for Efficient LLM Training
  • Parameter-efficient fine-tuning (PEFT) with low-rank adaptation (LoRA)
  • Open-Source Pre-Trained LLMs
  • LLM Training with PyTorch Lightning
  • Multi-GPU Training
  • LLM Deployment Considerations
  • Monitoring LLMs in Production

4. Getting Commercial Value from LLMs

  • Supporting ML with LLMs
  • Tasks that can be Automated
  • Tasks that can be Augmented
  • Best Practices for Successful A.I. Teams and Projects
  • What's Next for A.I.

Resources

The training includes links to external resources such as source code, presentation slides, and a Google Colab notebook. These resources make it interactive and useful for engineers and data scientists who are implementing Generative AI into their workspace.

Generative AI with Large Language Models: Hands-On Training
Image from Training

Here is a list of essential resources needed to build and deploy your own LLM model using Huggingface and Pytorch Lighting:

  • Presentation Slides
  • GitHub Code Source
  • Google Colab (T5-Finetune)
  • Youtube Video
  • Jon Krohn (Official Website)

Discover the secret to success in just 2 hours! Don't wait any longer!

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

More On This Topic

  • What are Large Language Models and How Do They Work?
  • AI: Large Language & Visual Models
  • Learn About Large Language Models
  • Top Open Source Large Language Models
  • More Free Courses on Large Language Models
  • Top Free Courses on Large Language Models

McKinsey Partners with Cohere to Unleash Secure Generative AI Solutions for Enterprise Customers

Global consultancy firm, McKinsey & Company, recently announced that it collaborated with Cohere, an generative AI startup, to offer AI solutions to its enterprise clients. This marks McKinsey’s first partnership with a large language model provider and comes as part of the growing interest in AI globally, ignited by Microsoft-backed OpenAI’s ChatGPT.

“We are seeing our clients consider cost, IP protection and consumer privacy, and how the model is trained. We found Cohere to be one of the great solutions out there,” said Ben Ellencweig, senior partner at McKinsey, in an interview with Reuters.

With this, the duo will work on building custom solutions to enhance customer engagement and automate workflows for its customers. Additionally, McKinsey is exploring the possibility of leveraging Cohere to improve internal efficiency and bolster its knowledge management system.

The collaboration between McKinsey and Cohere has already begun, providing services to companies across various industries, such as financial services and retail, although specific names were not disclosed.

Cohere, founded by Nick Frosst, a former top AI researcher from Google’s Alphabet, positions itself as a neutral provider for enterprises, not tied to specific cloud providers like Microsoft. It competes with OpenAI by focusing on generative AI solutions for businesses.

Unleashing Secure Generative AI Solutions

Recently, Cohere raised $270 million in funding from investors like Nvidia, Oracle, and Salesforce Ventures, reaching a valuation of $2.2 billion. Furthermore, it formed a partnership with Oracle, enabling the integration of its generative AI technology into Oracle’s products. Their cloud partners are AWS and Google Cloud.

Martin Kon, President at Cohere, emphasised that partnering with strong and complementary entities like McKinsey was part of their strategy to have a significant impact on the world through enterprise-focused AI solutions.

Other consulting firms have also been making AI investments and forming partnerships in the AI space. For instance, Accenture announced a $3 billion investment in AI, PwC committed to investing $1 billion over the next three years, Bain and Company teamed up with OpenAI, and Deloitte partnered with chipmaker Nvidia.

The collaboration between McKinsey and Cohere signifies the growing interest and investment in AI within the consulting industry, aiming to leverage AI technologies for various business applications and improvements.

The post McKinsey Partners with Cohere to Unleash Secure Generative AI Solutions for Enterprise Customers appeared first on Analytics India Magazine.

Unveiling the Power of Meta’s Llama 2: A Leap Forward in Generative AI?

Unveiling the Power of Meta's Llama 2: A Leap Forward in Generative AI?
Image created by Author with Midjourney Introduction

Recent breakthroughs in artificial intelligence (AI), particularly in generative AI, have captured the public's imagination and demonstrated the potential of these technologies to drive a new era of economic and social opportunities. One such breakthrough is Meta's Llama 2, the next generation of their open-source large language model.

Meta's Llama 2 is trained on a mix of publicly available data, and designed to drive applications such as OpenAI’s ChatGPT, Bing Chat, and other modern chatbots. Trained on a mix of publicly available data, Meta claims that Llama 2’s performance is improved significantly over previous Llama models. The model is available for fine-tuning on AWS, Azure, and Hugging Face’s AI model hosting platform in pretrained form, making it more accessible and easier to run. You can also download the model here.

But what sets Llama 2 apart from its predecessor and other large language models? Let's delve into its technical details and implications.

Technical Details and Performance

There are two flavors of Llama 2: Llama 2 and Llama-2-Chat. Llama-2-Chat has been fine-tuned for two-way conversations. Both versions come further subdivided into models of varying sophistication: 7 billion parameter, 13 billion parameter, and 70 billion parameter models. The models were trained on two trillion tokens, which is 40% morethen the first Llama model, including over 1 million human annotations.

Llama 2 has a context length of 4096, and employs reinforcement learning from human feedback specifically for safety and helpfulness in the case of Llama-Chat-2's training. Llama 2 outperforms other LLMs, including like Falcon and MPT, in the areas of reasoning, coding, proficiency, and knowledge tests, according to Meta.

Unveiling the Power of Meta's Llama 2: A Leap Forward in Generative AI?
Llama 2 technical overview
(Image source: Meta)

Furthermore, Llama 2 is optimized to run locally on Windows and on smartphones and PCs packing Qualcomm’s Snapdragon on-device technology, which means we can expect AI-powered apps that work without relying on cloud services starting from 2024.

"These new on-device AI experiences, powered by Snapdragon, can work in areas with no connectivity or even in airplane mode."

—Qualcomm (source: CNET)

Open-Source and Safety

One of the key aspects of Llama 2 is its open-source nature. Meta believes that by making AI models available openly, they can benefit everyone. This development permits both the business and research worlds to access tools that would become prohibitive to build and scale themselves, which opens up myriad opportunities for research, experimentation and development.

Meta also emphasizes safety and transparency. Llama 2 has been "red-teamed," and thus has been tested for safety by generating adversarial prompts for the purposes of fine-tuning the model, both internally and externally. Meta discloses how the models are evaluated and tweaked, promoting transparency in the development process.

Unveiling the Power of Meta's Llama 2: A Leap Forward in Generative AI?
Reinforcement learning from human feedback is used for safety and helpfulness during Llama-2-Chat model training
(Image source: Meta) Conclusion

Llama 2 does its best to continue Meta's perspective in the field of generative AI. Its improved performance, open-source nature, and commitment to safety and transparency make Llama 2 a promising model for a wide range of applications. As more developers and researchers gain access, we can expect to see a surge in innovative AI-powered solutions.

As we move forward, it will remain crucial to continue addressing the challenges and biases inherent in AI models. However, Meta's commitment to safety and transparency sets a positive precedent for the industry. With the release of Llama 2, we now have another tool available in our generative AI arsenal, and one that makes open access an ongoing commitment.

Matthew Mayo (@mattmayo13) is a Data Scientist and the Editor-in-Chief of KDnuggets, the seminal online Data Science and Machine Learning resource. His interests lie in natural language processing, algorithm design and optimization, unsupervised learning, neural networks, and automated approaches to machine learning. Matthew holds a Master's degree in computer science and a graduate diploma in data mining. He can be reached at editor1 at kdnuggets[dot]com.

More On This Topic

  • Unveiling Midjourney 5.2: A Leap Forward in AI Image Generation
  • Unveiling the Potential of CTGAN: Harnessing Generative AI for Synthetic…
  • Synthetic Data Platforms: Unlocking the Power of Generative AI for…
  • Meta-Learning for Keyphrase Extraction
  • How to land an ML job: Advice from engineers at Meta, Google Brain, and SAP
  • DINOv2: Self-Supervised Computer Vision Models by Meta AI

Park+ & Google Cloud Collaborate to Enhance Smart Parking Solutions

Google Cloud and Gurgaon-based car tech company Park+ have partnered to enable Park+ with the integration of open source software needs and other Google Cloud offerings including Cloud SQL, Google Kubernetes Engine, Anthos and Global Load Balancer.

With the public cloud software constantly evolving, Park+ is leveraging the power of open-source services on Google Cloud to build custom integrations and dashboards or orchestration pipelines without any hassle of maintenance. Park+ is developing solutions with next-generation ML and AI capabilities for a unique digital and conversational commerce experience for its customers.

Park+ has been using Google solutions since the inception of their company, including Google Analytics, Firebase, Google AdMob, and now Google Cloud. As hardcore open-source enthusiasts, the team was thrilled to see the out-of-the-box integration of the open-source software they use, which made Google Cloud an immediate choice for them. The latency of their applications decreased by more than 12% from 100ms to 88ms. They anticipate saving more than 900 hours in a year, which they previously invested in maintaining and managing open-source infrastructure.

Park+ is a superapp for car owners, offering a comprehensive range of services in the transportation sector. It functions as a hidden treasure by assisting users in discovering and reserving parking spaces, rapidly recharging FASTags, accessing daily car cleaning services, reviewing e-challans, monitoring car health, purchasing and renewing insurance, and much more.

Co-founded in 2019 by Amit Lakhotia, a former Vice President of Business at Paytm, and Hitesh Gupta, a former head of engineering for payments, Park+ prioritizes cutting-edge technology to streamline car ownership and maintenance. With a presence in over 20 cities, Park+ dominates the Indian market as the largest distributor of FASTags and access control systems, boasting an extensive inventory of parking slots throughout the country.

Inside the AI & Analytics Team of Park+

Park+ is an AI-driven platform for car owners. It employs data-tracking to offer a range of services, such as analyzing driving habits, detecting damages through 360° video, suggesting nearby hospitals during emergencies, personalized content recommendations, predicting car prices, managing FASTag transactions, identifying popular parking spots, recommending EV charging stations, suggesting maintenance schedules, and tracking car movement within specific areas. They also provide a unique service where users can send car videos for remote assessment. Park+ employs data science to track stolen cars using camera footage and RFID tags. The data science team, comprising 10 members, processes extensive data for valuable insights.

Read more: Data Science Hiring Process at Park+

The post Park+ & Google Cloud Collaborate to Enhance Smart Parking Solutions appeared first on Analytics India Magazine.

GPT-4 Details Have Been Leaked!

GPT-4 Details Have Been Leaked!
Image by Editor

A lot of people have been wondering what makes GPT-4 so much better than GPT-3. It has taken the world by storm. It is the most anticipated AI model currently, and people wanted to know more about it. OpenAI did not release anything regarding GPT-4, for example, the size, data, internal structure, or how they trained and built it. We’ve all been wondering as to why they have been concealing this information.

Well, you’re about to find out because the details on GPT-4 have been leaked!

So what details have we found out about GPT-4? Let’s dive in…

Model Size

Large language models (LLMs) have been growing over the years, and the model size reflects this. In 2022, GPT-3 had a model size of 1 trillion, which is a 15,000x increase in the past 5 years. It is said that GPT-4 is 10x the size of its predecessor, GPT-3. Stating that it has roughly 1.8 trillion parameters, across 120 layers. At 120 layers, GPT-4 is a deep architecture which is able to complex various complex tasks — making it one of the most advanced models out there!

Mixture of Experts

OpenAI is using MOE — A mixture of experts. Unlike GPT-3 which is one static model, GPT is a mixture of 8 x 220-billion-parameter models. These 8 x 220B models were trained on different data and task distributions, utilizing 16 experts within their model. Each model is roughly around 111 billion parameters for multi-layer perceptrons, with each expert having a specific role for example coding, or formatting.

Mixture-of-experts is not a new thing and has been around for a while. For example, Google uses a mixture of experts with expert choice routing which means depending on what type of question you are asking, it routes you to a different expert that answers your questions.

GPT-4 uses roughly 55b parameters solely for 'attention', for example guiding the model to stay on the topic at hand.

Inference

The inference is all about how LLMs make predictions. GPT-4 is doing pretty well in comparison to other models. It has been said that each forward-pass inference for the generation of 1 token only utilizes roughly 280 billion parameters and roughly 560 teraflops (the rate to measure your GPU’s performance).

Datasets

You can imagine how many datasets GPT-4 uses based on its performance and being a state-of-the-art model. It is stated that GPT-4 is trained on roughly 13 trillion tokens, which is roughly 10 trillion words. It uses 2 epochs for text-based data and 4 epochs for code-based data.

The actual size of the dataset is unknown, as some of these tokens were re-used, so we can roughly estimate that it includes several trillion tokens. Internally, there are also millions of rows of instructions which fine-tune data from ScaleAI.

Context Length

For the pre-training phase of GPT-4, it used a context length of 8 thousand tokens. After the pre-training, the sequence length was based on fine-tuning the 8 thousand tokens.

Batch Size

The batch size is the number of samples processed before the model is updated. The batch size was continuously increasing, with OpenAI using a batch size of 60 million, which is roughly around 7.5 million tokens per expert. In order to find out the real batch size, you will need to divide this number by the sequence length.

Training Costs

This is an area that a lot of you will be interested in — training costs. You can imagine how expensive GPT-4 was to build and train.

It took OpenAI roughly 2.1e25 FLOPS (floating point operations per second) of computing to train on roughly using around 25 though A100 processors in the space of 3 months. It is stated that GPT-4 is around 3x more computationally expensive to run than GPT-3.5. It is also said that GPT-4 costs 3x more than GPT-3 in regards to prompts.

For example, if OpenAI were training in the cloud was around $1 per A100 hour, the training cost for this hour alone would have cost $63 million.

Speculative Decoding

It has also been said that OpenAI might be using speculative decoding. The keyword ‘might’. This means that they are using smaller and faster models to help decode tokens and then feed this into large models as a single batch.

This means that if the predictions made from the smaller model were correct, the large model will agree with these predictions. However, if the larger model rejects the predictions from the smaller model, the rest of the batch is also discarded.

Wrapping it up

This leak reflects more of a high-level architecture leak, rather than a model leak — which a lot of people were expecting. Although it is not the same, this kind of information is still useful to know as we continue to see the growth of LLMs and how much it takes to create an AI model such as GPT-4.
Nisha Arya is a Data Scientist, Freelance Technical Writer and Community Manager at KDnuggets. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.