AI — Страница 1451

Llama 2: A Deep Dive into the Open-Source Challenger to ChatGPT

Large Language Models (LLMs) capable of complex reasoning tasks have shown promise in specialized domains like programming and creative writing. However, the world of LLMs isn't simply a plug-and-play paradise; there are challenges in usability, safety, and computational demands. In this article, we will dive deep into the capabilities of Llama 2, while providing a detailed walkthrough for setting up this high-performing LLM via Hugging Face and T4 GPUs on Google Colab.

Developed by Meta with its partnership with Microsoft, this open-source large language model aims to redefine the realms of generative AI and natural language understanding. Llama 2 isn't just another statistical model trained on terabytes of data; it's an embodiment of a philosophy. One that stresses an open-source approach as the backbone of AI development, particularly in the generative AI space.

Llama 2 and its dialogue-optimized substitute, Llama 2-Chat, come equipped with up to 70 billion parameters. They undergo a fine-tuning process designed to align them closely with human preferences, making them both safer and more effective than many other publicly available models. This level of granularity in fine-tuning is often reserved for closed “product” LLMs, such as ChatGPT and BARD, which are not generally available for public scrutiny or customization.

Technical Deep Dive of Llama 2

For training the Llama 2 model; like its predecessors, it uses an auto-regressive transformer architecture, pre-trained on an extensive corpus of self-supervised data. However, it adds an additional layer of sophistication by using Reinforcement Learning with Human Feedback (RLHF) to better align with human behavior and preferences. This is computationally expensive but vital for improving the model's safety and effectiveness.

Meta Llama 2 training architecture

Pretraining & Data Efficiency

Llama 2's foundational innovation lies in its pretraining regime. The model takes cues from its predecessor, Llama 1, but introduces several crucial enhancements to elevate its performance. Notably, a 40% increase in the total number of tokens trained and a twofold expansion in context length stand out. Moreover, the model leverages grouped-query attention (GQA) to amplify inference scalability.

Supervised Fine-Tuning (SFT) & Reinforcement Learning with Human Feedback (RLHF)

Llama-2-chat has been rigorously fine-tuned using both SFT and Reinforcement Learning with Human Feedback (RLHF). In this context, SFT serves as an integral component of the RLHF framework, refining the model's responses to align closely with human preferences and expectations.

OpenAI has provided an insightful illustration that explains the SFT and RLHF methodologies employed in InstructGPT. Much like LLaMa 2, InstructGPT also leverages these advanced training techniques to optimize its model's performance.

Step 1 in the below image focuses on Supervised Fine-Tuning (SFT), while the subsequent steps complete the Reinforcement Learning from Human Feedback (RLHF) process.

Instruction-GPT

Supervised Fine-Tuning (SFT) is a specialized process aimed at optimizing a pre-trained Large Language Model (LLM) for a specific downstream task. Unlike unsupervised methods, which don't require data validation, SFT employs a dataset that has been pre-validated and labeled.

Generally crafting these datasets is costly and time-consuming. Llama 2 approach was quality over quantity. With just 27,540 annotations, Meta's team achieved performance levels competitive with human annotators. This aligns well with recent studies showing that even limited but clean datasets can drive high-quality outcomes.

In the SFT process, the pre-trained LLM is exposed to a labeled dataset, where the supervised learning algorithms come into play. The model's internal weights are recalibrated based on gradients calculated from a task-specific loss function. This loss function quantifies the discrepancies between the model’s predicted outputs and the actual ground-truth labels.

This optimization allows the LLM to grasp the intricate patterns and nuances embedded within the labeled dataset. Consequently, the model is not just a generalized tool but evolves into a specialized asset, adept at performing the target task with a high degree of accuracy.

Reinforcement learning is the next step, aimed at aligning model behavior with human preferences more closely.

The tuning phase leveraged Reinforcement Learning from Human Feedback (RLHF), employing techniques like Importance Sampling and Proximal Policy Optimization to introduce algorithmic noise, thereby evading local optima. This iterative fine-tuning not only improved the model but also aligned its output with human expectations.

The Llama 2-Chat used a binary comparison protocol to collect human preference data, marking a notable trend towards more qualitative approaches. This mechanism informed the Reward Models, which are then used to fine-tune the conversational AI model.

Ghost Attention: Multi-Turn Dialogues

Meta introduced a new feature, Ghost Attention (GAtt) which is designed to enhance Llama 2's performance in multi-turn dialogues. This effectively resolves the persistent issue of context loss in ongoing conversations. GAtt acts like an anchor, linking the initial instructions to all subsequent user messages. Coupled with reinforcement learning techniques, it aids in producing consistent, relevant, and user-aligned responses over longer dialogues.

From Meta Git Repository Using download.sh

Visit the Meta Website: Navigate to Meta's official Llama 2 site and click ‘Download The Model'
Fill in the Details: Read through and accept the terms and conditions to proceed.
Email Confirmation: Once the form is submitted, you'll receive an email from Meta with a link to download the model from their git repository.
Execute download.sh: Clone the Git repository and execute the download.sh script. This script will prompt you to authenticate using a URL from Meta that expires in 24 hours. You’ll also choose the size of the model—7B, 13B, or 70B.

From Hugging Face

Receive Acceptance Email: After gaining access from Meta, head over to Hugging Face.
Request Access: Choose your desired model and submit a request to grant access.
Confirmation: Expect a ‘granted access' email within 1-2 days.
Generate Access Tokens: Navigate to ‘Settings' in your Hugging Face account to create access tokens.

Transformers 4.31 release is fully compatible with LLaMa 2 and opens up many tools and functionalities within the Hugging Face ecosystem. From training and inference scripts to 4-bit quantization with bitsandbytes and Parameter Efficient Fine-tuning (PEFT), the toolkit is extensive. To get started, make sure you're on the latest Transformers release and logged into your Hugging Face account.

Here's a streamlined guide to running LLaMa 2 model inference in a Google Colab environment, leveraging a GPU runtime:

Google Colab Model – T4 GPU

Package Installation

 !pip install transformers !huggingface-cli login

Import the necessary Python libraries.

 from transformers import AutoTokenizer import transformers import torch

Initialize the Model and Tokenizer

In this step, specify which Llama 2 model you'll be using. For this guide, we use meta-llama/Llama-2-7b-chat-hf.

 model = "meta-llama/Llama-2-7b-chat-hf" tokenizer = AutoTokenizer.from_pretrained(model)

Set up the Pipeline

Utilize the Hugging Face pipeline for text generation with specific settings:

 pipeline = transformers.pipeline(     "text-generation",     model=model,     torch_dtype=torch.float16,     device_map="auto")

Generate Text Sequences

Finally, run the pipeline and generate a text sequence based on your input:

 sequences = pipeline(     'Who are the key contributors to the field of artificial intelligence?n',     do_sample=True,     top_k=10,     num_return_sequences=1,     eos_token_id=tokenizer.eos_token_id,     max_length=200) for seq in sequences:     print(f"Result: {seq['generated_text']}")

A16Z's UI for LLaMa 2

Andreessen Horowitz (A16Z) has recently launched a cutting-edge Streamlit-based chatbot interface tailored for Llama 2. Hosted on GitHub, this UI preserves session chat history and also provides the flexibility to select from multiple Llama 2 API endpoints hosted on Replicate. This user-centric design aims to simplify interactions with Llama 2, making it an ideal tool for both developers and end-users. For those interested in experiencing this, a live demo is available at Llama2.ai.

LLaMa2.ai

Llama 2: What makes it different from GPT Models and its predecessor Llama 1?

Variety in Scale

Unlike many language models that offer limited scalability, Llama 2 gives you a bunch of different options for models with varied parameters. The model scales from 7 billion to 70 billion parameters, thereby providing a range of configurations to suit diverse computational needs.

Enhanced Context Length

The model has an increased context length of 4K tokens than Llama 1. This allows it to retain more information, thus enhancing its ability to understand and generate more complex and extensive content.

Grouped Query Attention (GQA)

The architecture uses the concept of GQA, designed to fasten the attention computation process by caching previous token pairs. This effectively improves the model's inference scalability to enhance accessibility.

Performance Benchmarks

Performance Analysis of Llama 2-Chat Models with ChatGPT and Other Competitors

LLama 2 has set a new standard in performance metrics. It not only outperforms its predecessor, LLama 1 but also offers significant competition to other models like Falcon and GPT-3.5.

Llama 2-Chat's largest model, the 70B, also outperforms ChatGPT in 36% of instances and matches performance in another 31.5% of cases. Source: Paper

Open Source: The Power of Community

Meta and Microsoft intend for Llama 2 to be more than just a product; they envision it as a community-driven tool. Llama 2 is free to access for both research and non-commercial purposes. The are aiming to democratize AI capabilities, making it accessible to startups, researchers, and businesses. An open-source paradigm allows for the ‘crowdsourced troubleshooting' of the model. Developers and AI ethicists can stress test, identify vulnerabilities, and offer solutions at an accelerated pace.

While the licensing terms for LLaMa 2 are generally permissive, exceptions do exist. Large enterprises boasting over 700 million monthly users, such as Google, require explicit authorization from Meta for its utilization. Additionally, the license prohibits the use of LLaMa 2 for the improvement of other language models.

Current Challenges with Llama 2

Data Generalization: Both Llama 2 and GPT-4 sometimes falter in uniformly high performance across divergent tasks. Data quality and diversity are just as pivotal as volume in these scenarios.
Model Transparency: Given prior setbacks with AI producing misleading outputs, exploring the decision-making rationale behind these complex models is paramount.

Code Llama – Meta's Latest Launch

Meta recently announced Code Llama which is a large language model specialized in programming with parameter sizes ranging from 7B to 34B. Similar to ChatGPT Code Interpreter; Code Llama can streamline developer workflows and make programming more accessible. It accommodates various programming languages and comes in specialized variations, such as Code Llama–Python for Python-specific tasks. The model also offers different performance levels to meet diverse latency requirements. Openly licensed, Code Llama invites community input for ongoing improvement.

Introducing Code Llama, an AI Tool for Coding

Conclusion

This article has walked you through setting up a Llama 2 model for text generation on Google Colab with Hugging Face support. Llama 2's performance is fueled by an array of advanced techniques from auto-regressive transformer architectures to Reinforcement Learning with Human Feedback (RLHF). With up to 70 billion parameters and features like Ghost Attention, this model outperforms current industry standards in certain areas, and with its open nature, it paves the way for a new era in natural language understanding and generative AI.

Technology Leaders Can Turbocharge Their Company’s Growth In Five Ways

As technology weaves its way into every business function, the roles of both the technology organization and the technology leader have changed. In the case of the technology leader, this role has shifted from focusing primarily on technology operations to becoming a crucial business enabler with a seat at the C-level table, where pivotal decisions are made and solid revenue growth is generated.

It’s clear that customers and growth are the “raison d’être” of companies, and that growth is an even more important focus during uncertain economic times. What may not be as clear is that technology is an essential component — the power, if you will — of every company’s growth engine. The key to achieving technology-powered growth is alignment — honest and open collaboration within the technology organization and between technology leaders and their marketing, sales, product and customer experience counterparts.

Jump to:

How can technology leaders power their organization’s growth agenda?
How can technology leaders themselves achieve recognition as a pillar of growth?

How can technology leaders power their organization’s growth agenda?

Ultimately, when it comes to powering growth within their organizations, technology leaders can achieve this goal in five key ways. They can:

Enable growth by operating secure, rock-solid operations and customer-facing systems.

A business’s marketing, sales, product, digital and service platforms span the customer journey. These customer solutions — those that power customer self-service and those that empower employees — are vital everyday tools that should work so well that they disappear into the background for customers and employees. Job one for CIOs and other technology leaders is operating those platforms flawlessly while reducing costs. In doing so, they will earn a recognized place at the growth table.

Create growth by collaborating to produce and scale new products and market channels.

Creating growth is everybody’s responsibility. Because CIOs and the IT organization have the broadest span of control and visibility across the business, processes and data systems, they are in the best position to see the big picture and help in this regard.

Some growth will be powered by new technologies; CIOs and other technology leaders can demonstrate how emerging technologies create specific growth opportunities. Instead of pitching random acts of metaverse or blockchain, which require radical changes in life or trade to matter, technology leaders can iterate on new technologies and infuse ideas from these into their own products. For example, they can show how computer vision accelerates checkout, how generative AI carries human expression into digital products, or how edge computing creates better machinery replenishment services.

Amplify growth by optimizing everything through insights, automation and artificial intelligence.

Outcomes of all kinds can always be improved — AI is just the newest tool in the improvement toolkit, joining analytics, automation and software. Personalization at scale is a good example of amplifying growth. Technology leaders should collaborate with marketing colleagues and mine databases to find better purchase signals that improve offers and outreach. They can also automate processes to streamline onboarding and improve revenue recognition. Putting customer and product insights into employees’ hands and making them experts will lead to customer engagement success.

Activate partnerships to scale their ability to transform and grow.

No technology leader and no company will do this alone. They will work with technology and service providers to build and operate the new capabilities, including those powered by generative AI. The technology organization can expand its talent and capacity, scale its company’s ability to grow and empower business colleagues with co-innovation partnerships that focus providers’ attention on successful business outcomes.

Harness AI to rewrite the calculus of growth.

AI is a game changer because it builds on the technologies that came before — software, cloud, IoT and more — to create new growth opportunities. Although it’s tempting to regard AI as just a better tool in the analytics toolbox, the ability to learn, reason, make inferences, draw conclusions, make recommendations and generate content is what elevates AI as the next major driver of every vector of technology-powered growth.

How can technology leaders themselves achieve recognition as a pillar of growth?

CIOs and other technology executives have the primary responsibility to insert technology squarely into their organization’s growth conversation. No other executive has the full visibility and opportunity to advance technology-driven growth and gauge what peers and competitors are doing with technology. To elevate their position and become the go-to technology executive, they should build on their organization’s current position, identify their priorities and communicate the business value of technology to their firm.

Leaders can also take three steps immediately to establish IT and themselves as pillars of growth. These include raising their organization’s growth awareness and tying it more deeply into business performance; establishing technology’s role in growth for the next 12 months; and communicating what they are doing to drive growth both today and in the future.

Ted Schadler. Image: Forrester

This article was written by Ted Schadler, vice president and principal analyst at Forrester. As a member of Forrester’s Technology Executive research team, Ted’s research and guidance expertise focus on technology strategy, the role of service providers as co-innovation partners, the CIO’s role in the growth agenda, and digital experience and commerce services. Co-author of the books “The Mobile Mind Shift: Engineer Your Business to Win in the Mobile Moment” (Groundswell Press) and “Empowered: Unleash Your Employees, Energize Your Customers, and Transform Your Business” (Harvard Business Review Press), Ted has a master’s degree in management from the MIT Sloan School of Management, an MS in computer science from the University of Maryland and a BA with honors in physics from Swarthmore College.

To hear more from Ted Schadler on the technology leader’s role in powering growth, check out his keynote at Forrester’s upcoming Technology & Innovation North America Forum, September 10–12, 2023, in Austin, Texas.

Subscribe to the Executive Briefing Newsletter

Discover the secrets to IT leadership success with these tips on project management, budgets, and dealing with day-to-day challenges.

Delivered Tuesdays and Thursdays Sign up today

NVIDIA CEO Jensen Huang Discusses AI Potential with PM Modi

Indian Prime Minister Narendra Modi recently met with Jensen Huang, the CEO of the leading AI company Nvidia. Their lengthy conversation was centred on the “rich potential” India presents in the field of AI.

In a tweet posted on X (formerly Twitter), Prime Minister Modi shared the positive outcomes of their discussion saying, “Had an excellent meeting with Mr. Jensen Huang, the CEO of @nvidia. We talked at length about the rich potential India offers in the world of AI. “Mr. Jensen Huang was appreciative of the strides India has made in this sector and was equally upbeat about the talented youth of India,” the prime minister tweeted.

Had an excellent meeting with Mr. Jensen Huang, the CEO of @nvidia. We talked at length about the rich potential India offers in the world of AI. Mr. Jensen Huang was appreciative of the strides India has made in this sector and was equally upbeat about the talented youth of… pic.twitter.com/zT6Cyrmk5z

— Narendra Modi (@narendramodi) September 4, 2023

Nvidia has established a robust technological relationship with India, as highlighted by recent developments. Less than a week prior to this meeting, Industry.AI, a member of Nvidia’s Metropolis vision AI partner ecosystem, implemented its vision AI platform at a major airport in India’s Silicon Valley.

Annually around 32 million people travel through Bengaluru airport making it an important destination for deploying emerging technologies. The hub’s recently built terminal, T2 saw a debut at scale for intelligent video analytics at an Indian airport with this deployment.

Industry.AI uses a combination of NVIDIA’s TAO Toolkit and A100 Tensor Core GPUs to train its AI models. The company’s AI inference operations are facilitated by NVIDIA’s Triton Inference Server with A30 Tensor Core GPUs. Furthermore, Industry.AI has integrated NVIDIA’s DeepStream sdk for AI-enhanced video analytics.

Interestingly, last year during the company’s annual GTC Summit Huang underscored the company’s substantial interests in India. He mentioned, “Our three largest geographies are California, China and India, with India being slightly larger than China.” He also expressed keen interest in India’s talent pool, urging those in the AI community to work with the company.

He went on to praise the country, saying, “I believe that AI is going to absolutely revolutionise the tech industry in India. The number of startups in India is skyrocketing, and every single one of them relies on AI.” Even if it is less likely that India could create original algorithms, there was a hunger to learn, he said.

The post NVIDIA CEO Jensen Huang Discusses AI Potential with PM Modi appeared first on Analytics India Magazine.

Want to Become a Data Scientist? Part 1: 10 Hard Skills You Need

Image by Author

You may come across a lot of comprehensive articles on how to become a data scientist. They provide a lot of good information, however, they can be very overwhelming. Especially as a beginner, you just want to know what you need to know and get cracking.

This is exactly what this blog will be about. I will go through the 10 hard skills you need to become a data scientist.

Let's go…

Programming Language

If you do not know how to code in any programming language, your first step will be to learn how to code. My recommendation will be Python, as it is arguably the most popular programming language for data science.

Other languages you can learn for data science are R, SQL, Julia, and more.

Mathematics

A topic that some people say you don’t need in the world of coding. But I believe that is truly wrong. I did a BootCamp that did not touch on the mathematical side — and I definitely realized it played a big weakness in my proficiency in the field.

Areas of math that you will need for data science are linear algebra, linear regression, probability and statistics. Learning the math behind data science will be highly beneficial for your data science career and noticed by your employer.

Learning math can be nerve-wracking, so I completely understand your hesitance. Have a read of How To Overcome The Fear of Math and Learn Math For Data Science to ease your mind.

Integrated Development Environments (IDE)

An Integrated Development Environment (IDE) is a software application that has a comprehensive environment that has a combination of tools and features specifically for software development. IDEs will help you execute data analysis, visualization, and machine learning tasks. Choosing the right IDE for you is more down to your preference, for example, there are:

Jupyter Notebook
Google Colab
Visual Studio Code
PyCharm
RStudio

Your IDE is where you will learn how to become proficient in your programming language, learn math, and all the below. Jupyter Notebook and Visual Studio Code are my favorites! These will also be highly beneficial when you get a job as employers expect you to know popular IDEs.

Libraries

Coding has been made so much easier over the years, and this is down to the variety of libraries available. These libraries are tools that you can use to streamline the data analysis and machine learning processes.

If you have decided to learn Python, these are the libraries I would suggest you learn:

NumPy
Pandas
Matplotlib
Seaborn
Scikit-Learn
TensorFlow
PyTorch
NLTK (Natural Language Toolkit)
Beautiful Soup
Scrapy

The reason I am providing you with a list of libraries at the start is that as you go through your data science learning journey, you will start to see these libraries a lot. Learn what each of them provides and you will see where you can apply it. For example, Matplotlib can be used for data visualization.

Data Transformation

Exactly what it says — transforming your data. Data transformation is an important phase for a data scientist as you will spend a lot of time taking raw data and modifying, adjusting and converting it into a format that can be used for analysis and other tasks.

You will need to learn about normalization, standardization, scaling, feature engineering, and more.

An article you can read: Data Transformation: Standardization vs Normalization

Data Visualisation

Data visualization is an important aspect of data science, as you will need to be able to convey your findings in more than one way other than coding. Not everybody on your team will be technically inclined, therefore presenting your findings in visuals will help with this and also the decision-making process.

Have a read of: Data Visualization Best Practices & Resources for Effective Communication

Machine Learning

The next thing you want to learn is machine learning. There are a variety of aspects within machine learning, and you won't be able to be an expert in everything — but it's still good to be a jack of all trades within this area. Brace yourself, because there’s a lot to learn.

You will want to start with the fundamental concepts such as supervised learning, unsupervised learning, classification and regression tasks. Once you have a good understanding of these and can differentiate them, you will then want to learn more about the different machine learning algorithms, such as support vector machines and neural networks.

Once you understand machine learning models, you will need to learn:

Building a Machine Learning Model
Model Evaluation
Deployment
Model Interpretability
Overfitting and Underfitting
Hyperparameter Tuning
Validation and Cross-Validation
Ensemble Methods
Dimensionality Reduction
Regularization Techniques
Gradient Descent
Neural Networks and Deep Learning
Reinforcement Learning

As I said, there’s a lot to learn in this area, so I would advise you to take your time and practice!

Here’s an article that can help you: Top 15 YouTube Channels to Level Up Your Machine Learning Skills

Big Data Tools

Having all this knowledge is great, but some tools can take your data science career to the next level. Understanding different technologies, where they can be used and the pros and cons will make your data science journey more efficient.

There are a variety of tools and technologies out there that can be of great benefit to anybody working with data. However, I will list a few popular ones, such as Apache Spark, TensorFlow, PyTorch, Hadoop, Tableau, Git, and more.

Cloud Computing

Cloud computing is a very important element of data science because all the projects and tasks that you will be working on will turn into products. Cloud computing services enable scalable storage, and computing power and provide easy access to tools and services.

You will need to learn about cloud platforms such as Amazon Web Service, Microsoft Azure, and Google Cloud Platform.

Other cloud computing aspects you will need to be knowledgeable about are data storage, databases, data warehousing, big data processing, containerisation, and data pipelines.

Have a read of:

Beginner’s Guide to Cloud Computing
How to Efficiently Scale Data Science Projects with Cloud Computing

Projects

I am going to add projects as the last hard skill you need as it showcases all of the above. Don’t go and do a bunch of projects just because you want to put it on your resume and land yourself a job. Yes, that is the end goal, but ensure that you fully understand your projects.

In an interview, you will be asked about your projects, the ins and outs and you need to be prepared to answer with as much knowledge as possible. Use your projects to showcase your skills, and how you identified your weaknesses and worked on them.

Have a read of:

5 Data Analysis Projects For Beginners
5 Advance Projects for Data Science Portfolio

Wrapping it up

I tried to keep this article as condensed as possible so you don’t feel overwhelmed. I hope I have succeeded and provided you with enough detail and resources to go and kickstart your data science journey!

Have a look out for Part 2 for the soft skills you need as a data scientist.
Nisha Arya is a Data Scientist, Freelance Technical Writer and Community Manager at KDnuggets. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.

Musk’s X Slapped with 2,200+ Unpaid Salary & Bonus Cases, Worth Millions

After Musk acquired Twitter (now X) for $44 billion, he made a series of dramatic changes including ditching the ‘blue-tick’ verification; abruptly rebranding the platform’s name to ‘X’, making the blue bird extinct and purging nearly two third of his employees. However, these bold moves have since thrust the company into a legal area, as it now faces over 2,200 claims — estimated $3.5 million as reported by CNBC.

The legal battle has come to light through court documents filed in the case of Chris Woodfield v. Twitter, X Corp., and Elon Musk (No. 1:23-cv-780-CFC). The fee for each filing in this case stands at a substantial $2,000, in accordance with the JAMS arbitration system, of which $400 will be borne by the former employees themselves. By January 2023, a mere three months after Musk’s acquisition, the court had already received 200 arbitration demands, a number that has since ballooned to over 2,200 by August.

Adding fuel to the fire, in June, Shannon Liss-Riordan, an attorney involved in a proposed class-action lawsuit against the company, alleged that X had failed to disburse “tens of millions of dollars” in bonuses owed to its employees.

This complaint is only one of the lawsuits from disgruntled former Twitter employees, all claiming that Musk did not uphold the commitments he made during the takeover. Experts expect that the number of legal cases may continue to rise, while X’s legal expenses mount as it defends its position.

In response to the increasing legal pressure, the legal team has argued that X did not mandate its employees to address disputes through arbitration, thereby disclaiming responsibility for the former employee’s share of the filing fees.

X’s legal woes go beyond internal disputes. In July, the company filed a lawsuit against the Center for Countering Digital Hate, a hate speech watchdog organisation. Simultaneously, X initiated proceedings against Wachtell law firm previously employed by Twitter’s prior administration to ensure Musk upheld his end of the acquisition deal. The lawsuit saga does not end here as the company has faced multiple lawsuits ranging from unpaid bills to copyright issues.

The post Musk’s X Slapped with 2,200+ Unpaid Salary & Bonus Cases, Worth Millions appeared first on Analytics India Magazine.

ReAct, Reasoning and Acting augments LLMs with Tools!

Introduction

Short for Reasoning and Acting, this paper introduces a new concept that improves the performance of LLMs and also provides us with more explainability and interpretability.

The goal of AGI could be one of the most important goals for human civilization to achieve. Imagine creating artificial intelligence that could generalize to many problems. There are many interpretations of what an AGI is, and when do we say that we have achieved it?

The most promising method for AGI in the last decades was the reinforcement learning path, more specifically what DeepMind was able to achieve hard tasks, AlphaGo, AlphaStar and so many breakthroughs…

However, ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples.

With this kind of result (of course, provided there is no data leakage and we can trust the evaluation methods provided in the paper), we can no longer ignore LLMs’ potential to reason and divide complex tasks into logical steps.

The Motivation Behind The Paper

This paper starts with the idea that LLMs so far are impressive in language understanding, they have been used to generate CoT (Chain of thought) to solve some problems, and they were also used for acting and plan generation.

Although these two have been studied separately, the paper aims to combine both reasoning and acting in an interleaved manner to enhance LLM's performance.

The reason behind this idea is that if you think about how you, as a human, behave in order to execute some task.

The first step is that you’ll use “inner Speech” or you’ll write down or communicate with yourself somehow, saying “How do I execute task X? to do task X I need to first do step 1 and then do step2 and so on”

More concretely, if you were to cook up a dish in the kitchen, you could ReAct something like this:

“Now that everything is cut, I should heat up the pot of water”), to handle exceptions or adjust the plan according to the situation (“I don’t have salt, so let me use soy sauce and pepper instead”), and to realize when external information is needed (“how do I prepare dough? Let me search on the Internet”).

You can also act (open a cookbook to read the recipe, open the fridge, check ingredients) to support the reasoning and answer questions (“What dish can I make right now?”).

This combination of both reasoning and acting is what makes humans learn and achieve tasks even under previously unseen circumstances or when faced with information uncertainties.

Reasoning Only Approach

Previous works demonstrated the capabilities of LLMs to reason, for example, Chain of Thought Prompting demonstrated that the model could come up with plans to answer questions in arithmetic, common sense, and symbolic reasoning.

However, the model here is still a “static black box” because it uses its internal language representation to answer these questions, and this representation may not always be accurate or up-to-date which leads to fact hallucination (coming with facts from its own imagination) or error propagation (one error in the chain of thoughts propagates to a wrong answer).

Without the ability to take some sort of action and update its knowledge, the model is limited.

Acting Only Approach

There have also been studies that employed LLMs to do actions based on language, these studies usually take in multimodal inputs (audio, text, and images), convert them to text, use the model to generate in-domain actions, and then use a controller to do these actions.

Without the ability to plan some steps and reason about what to do, the model will simply output the wrong actions.

Combining both into ReAct

The proposal of this paper is to combine both methods mentioned above. ReAct prompts LLMs to generate both verbal reasoning traces and actions pertaining to a task in an interleaved manner, which allows the model to perform dynamic reasoning to create, maintain, and adjust high-level plans for acting (reason to act), while also interacting with external environments (e.g., Wikipedia) to incorporate additional information into reasoning (act to reason).

This is shown in the figure below:

Difference between Reason, Act and ReAct (Photo taken from the paper)
The Action Space

So in order to make the reasoning prompting better, they design an action space, which means three actions that the model is allowed to use when answering questions.

This is done through a Wikipedia API that provides the following:

search[entity]: returns the first 5 sentences from the corresponding entity wiki page if it exists, or else suggests top-5 similar entities from the Wikipedia search engine
lookup[string], which would return the next sentence in the page containing the string, simulating Ctrl+F functionality on the browser
finish[answer], which would finish the current task with the answer

Something that is not usual here is that there are way more powerful information retrieval tools than the ones mentioned above.

The goal behind this is to simulate human behavior and how a human would interact with Wikipedia and reason to find an answer.

Prompting

In addition to the provided tools, we need to properly prompt the LLM, to provide reasoning and properly chain actions.

To this end, they use a combination of thoughts that decompose a question like (“I need to search x, find y, then find z”), extract information from Wikipedia observations (“x was started in 1844”, “The paragraph does not tell x”), perform common sense (“x is not y, so z must instead be…”) or arithmetic reasoning (“1844 < 1989”), guide search reformulation (“maybe I can search/lookup x instead”), and synthesize the final answer (“…so the answer is x”)

Finally, the results look something like this:

How ReAct works and leads to better results (Photo taken from the paper)
Results

The datasets chosen for the evaluation are the following:

HotPotQA: is a question-answering dataset that requires reasoning over one or two Wikipedia pages.

FEVER: a fact verification benchmark where each claim is annotated SUPPORTS, REFUTES, or NOT ENOUGH INFO, based on whether there exists a Wikipedia passage to verify the claim.

ALFWorld: Text Based game that includes 6 types of tasks that the agent needs to perform to achieve a high-level goal.

An example would be “examine paper under desk lamp” by navigating and interacting with a simulated household via text actions (e.g. go to coffee table 1, take paper 2, use desk lamp 1)

WebShop: an online shopping website environment with 1.18M real-world products and 12k human instructions with much more variety and complexity.

It requires an agent to purchase a product based on user instructions. For example “I am looking for a nightstand with drawers. It should have a nickel finish, and be priced lower than $140”, the agent needs to achieve this through web interactions.

So the results show that ReAct always outperforms Act, which goes to show that the reasoning part is extremely important to enhance the actions.

On the other hand, ReAct outperforms CoT on Fever (60.9 vs. 56.3) and slightly lags behind CoT on HotpotQA (27.4 vs. 29.4). So for the FEVER dataset, acting to get updated knowledge is showing to give the needed boost to make the right SUPPORT or REFUTE decision.

When comparing CoT vs ReAct on HotpotQA and why the performance is comparable, these are the key observations found:

Hallucination is a serious problem for CoT,so with no way to update its knowledge, CoT has to imagine and hallucinate things, which is a big hurdle.
While interleaving reasoning, action, and observation steps improve ReAct’s groundedness and trustworthiness, such a structural constraint also reduces its flexibility in formulating reasoning steps. ReAct may force the LLM to do actions when just doing CoT is sometimes enough.
For ReAct, successfully retrieving informative knowledge via search is critical. If search retrieves wrong information than automatically any reasoning based that false information is wrong, so getting the right information is crutial.

ReAct and CoT results on different datasets (Photo taken from the paper)

I hope this article helped you to understand this paper. You can check it out here https://arxiv.org/pdf/2210.03629.pdf

Implementations of ReAct exist already here and here.

Mohamed Aziz Belaweid is a Machine Learning / Data Engineer at SoundCloud. He is interested in both Research and Engineering. He like reading papers and actually bringing their innovation to life. He have worked on Language model training from scratch to specific domains. Extracting information from text using Named Entity Recognition, Multi Modal search systems, Image classification and detection. Also worked in operations side such as model deployment, reproducibility, scaling and inference.

Original. Reposted with permission.

Myntra’s New Generative AI Tool Will Surprise You

“I am going to Goa for a vacation, show me what I can wear,” we asked, and within seconds, this new tool called–MyFashionGPT was able to fetch results on shorts, t-shirts, sunglasses, hats and sunscreen.

The brainchild of Myntra. This new tool enables users to search using natural language, alongside giving relevant suggestions based on customer queries. “This is a first-of-its-kind solution in e-commerce in India and possibly globally,” avered Myntra’s chief technology and product officer, Raghu Krishnananda, in an exclusive interaction with AIM.

He said that they used ChatGPT for query understanding and then leveraged its own search infrastructure to fetch relevant and related products from its catalogue and show them as collections. It is working on more features that use generative AI, and will get launched in the near future on the platform.

Tech Stack

Myntra has been using both proprietary and open-source algorithms based on user cases. Krishnanda believes that open-source algorithms provide a quicker path to market. “When proprietary data is involved, using a hosted model would be the right approach where we train the open source models on Myntra-specific knowledge such as the product taxonomy.” The company has also developed its own AI models and combined multiple models to solve for specific use cases, especially in image science applications.

Myntra is currently leveraging AzureAI services that give access to OpenAI models such as ChatGPT3.5, Dall-E, etc. “We are looking at privately hosted models as well as managed service models based on the use cases, and we will continue to have partnerships that serve this need.”

Myntra’s latest tool MyFashionGPT, is integrated with ChatGPT3.5. “For text-related generative AI, we use ChatGPT3.5 and for image-related generative AI, we use Stable Diffusion-based models in conjunction with other internally developed models.” A number of other AI-based solutions in Myntra (non-generative AI) such as MyStylist have been developed in-house.

Myntra’s Generative AI Prowess

Synonymous with fashion and lifestyle, Myntra have been aggressively pushing through to bring generative AI onto their platform with the big picture of enhancing customer experience. “We have been using AI for more than five years now and see huge benefits. In that sense Myntra is an AI-first company,” said Krishnananda.

Myntra’s adoption of AI-based solutions has not only helped customers but also sellers. “AI-based solutions such as trend identification, demand prediction, and others are helping sellers bring the right merchandise and assortment and stay ahead of the trend,” said Krishnananda. Furthermore, its inventory and route optimisation algorithms have helped improve logistics.

While Myntra may have carved an AI niche in the fashion segment, other e-commerce players have also dived into the generative AI wave with a number of use cases (see below).

Source: Paxcom Report

Tech giant Amazon, who have already been implementing generative AI solutions on AWS and other services, are also working on bringing the same to its e-commerce vertical. The company is testing AI-generated customer review highlights that will present concise summaries of written reviews to aid a shopper in making quick purchasing decisions.

To cater to small-scale sellers, last week Amazon launched its virtual assistant ‘सहAI’ (sahai). The AI tool will help sellers list their products online, analyse sales trends and thereby assist with improving sales.

Challenges Galore

Training and inference for very large proprietary generative AI models is a challenge for any company, and it is easier said than done. “We are working to take ‘smaller’ open-source models and fine tune them on our own data,” said Krishnananda, emphasising the safety and cost benefits of training the model, without revealing the names of the smaller models (namely, Llama, Vicuna, etc.) being used.

Confident Myntra believes that it faces fewer challenges when it comes to adopting generative AI in their workflow. Krishnananda also spoke about how they are bringing adoption not just on the platform front, but also within teams. “The tech team is taking measures to democratise the use of Generative AI by providing internal APIs that the broader tech team can play with, as well as organise tech talks and knowledge sharing sessions,” he concluded, saying that they are building in-house frameworks for low cost fine-tuning and inference using GPUs.

The post Myntra’s New Generative AI Tool Will Surprise You appeared first on Analytics India Magazine.

WavJourney: A Journey into the World of Audio Storyline Generation

Introduction

The recent advent of Large Language Models has taken the world by storm. Now, imagination is the limit. Today, WavJourney can automate the art of storytelling. Given a single prompt, WavJourney leverages the power of LLMs to generate grasping audio scripts, complete with an accurate storyline, lifelike human voices, and engaging background music.

To properly view the powers of audio generation, consider the following scenario. We only need to provide a simple instruction, describing a scenario and scene setting, and the model generates a gripping audio script highlighting the supreme context relevance to the original instruction.

INSTRUCTION: Generate audio in Science Fiction theme: Mars News reporting that Humans send a light-speed probe to Alpha Centauri. Start with a news anchor, followed by a reporter interviewing a chief engineer from an organization that built this probe, founded by United Earth and Mars Government, and end with the news anchor again.

GENERATED AUDIO: https://audio-agi.github.io/WavJourney_demopage/sci-fi/sci-fi%20news.mp4

To truly understand the internal workings of this marvel, let us dive deep into the methodology and implementation details of the generation process.

Generation Process

The image below summarizes the complete process in a simple flowchart.

Image from Paper

The end-to-end audio generation process is composed of multiple submodules, that are executed sequentially for a complete Text-to-Audio model.

Audio Script Generation

WavJourney utilizes GPT-4 model with a predefined prompt template to generate the script. The prompt templates restrict the output to be in a simple JSON format, that can easily be parsed later by a computer program. Each script has 3 different audio types as shown in the image above: Speech, sound effects, and music. Each audio type can then be run as foreground audio, or overlaid as a background sound effect over other audio. Other attributes such as content description, length, and character are sufficient attributes to formally define an audio setting for script generation.

Script Parsing

The output script is then passed through a computer program, that parses the relevant information from the predefined JSON script format. It associates each description and character to a preset speech audio. This process helps in breaking down the audio generation process into separate steps, that include text-to-speech, music, and sound addition.

Audio Generation

The parsed script is executed as a Python program. Foreground speech is first generated that is overlaid by background music and sound effects. For speech generation, the model uses the pre-trained Bark model and a VoiceFixer restoration model to improve audio quality. AudioLDM and MusicGen models are utilized for sound effects and music overlays. The outputs of all three models are combined for the final audio output.

Human-Machine Co-Creation

The process maintains context of the generated scripts, and can be prompted similar to GPT models. You can easily modify the generated script using human feedback and chat capabilities of GPT models.

Adding specific details and sound effects could not have been easier than this.The flowchart below shows how simple it is to add or modify specific details of the generated script.

Image from Paper Conclusion

The audio generation model can be a game-changer for the entertainment industry. The process has the ability to generate engaging narratives and stories, that can be utilized for educational and entertainment purposes, automating tedious voice-over and video generation processes.

For a detailed understanding, overview the paper here. The code will soon be available on GitHub.
Muhammad Arham is a Deep Learning Engineer working in Computer Vision and Natural Language Processing. He has worked on the deployment and optimizations of several generative AI applications that reached the global top charts at Vyro.AI. He is interested in building and optimizing machine learning models for intelligent systems and believes in continual improvement.

Now You Can Use Canva on ChatGPT

Canva has been constantly updating its features and integrating AI on its platform. Now, it’s ChatGPT’s turn to take it up another notch. GPT-4 plugin store now has a Canva plugin, which users can add to create any design, table, graphic, pictures quickly. The plugin features not just images, but short videos for reels as well.

Currently available exclusively on ChatGPT Plus, which is the paid version of the chatbot, the user can select from various templates available like posters and flyers and ChatGPT will generate a custom template with the given text, so users don’t have to do it manually.

2. Create a visual
→ Describe what you want to create in ChatGPT
→ Prompt (e.g.): "I'm launching my new AI-related app. Created a Reel Instagram template related to AI and new technologies." pic.twitter.com/ICEFsHNivQ

— Paul Couvert (@itsPaulAi) September 3, 2023

After the image is imported from Canva onto the platform, users can customise it easily with the right prompts and also port and redirect to the Canva app for further editing the image. Unlike Midjourney or other text-to-image generators, this plugin doesn’t create an image from scratch but instead uses the templates from Cavnva to quickly whip up a design required by the user.

Canva upping the AI game

In March this year, Canva announced a range of AI updates to speed up the design process. These tools include features like Text to Image, Magic Eraser (which removes unwanted objects from images using AI), Magic Edit and Translate which transforms designs into different languages. Additionally, Beat Sync ensures seamless music synchronisation with video content.

Beyond these AI tools, Canva provides further AI-driven features for user convenience. These AI capabilities, both in design and user experience, streamline the creative process and contribute to the creation of professional-quality designs.

The Canva plugin on ChatGPT simplifies the design process, making the whole process faster. While Canva changed the game of graphic design, bringing tools to everyone, this takes it a step further by allowing swift creation of professional designs. Interestingly, users had figured out hacks to integrate Canva within ChatGPT earlier, but with this update, users can directly link the two for their content needs, officially.

When it comes to ChatGPT, this plugin hints at the next step for OpenAI to finally make GPT-4 multimodal, something that the company has been promising for a very long time.

The post Now You Can Use Canva on ChatGPT appeared first on Analytics India Magazine.

[Exclusive] First Indian IT Company To Bring ‘Made in India’ ChatGPT

A few weeks ago, Tech Mahindra announced the launch of Project Indus – an Indic-based foundational model for Indian languages, which could potentially prove to be its most important project ever. Large language models (LLMs) like the GPT models by OpenAI, despite their multilingual capabilities, have been predominantly trained on English datasets, which limits their proficiency in comprehending and generating content in Indic languages. Hence, an Indic LLM could prove to be monumental for India.

According to Tech Mahindra’s chief CP Gurnani, the model will be the biggest Indic LLM and could possibly cater to 25% of the world’s population. While Tech Mahindra has not revealed the cost associated with the project or when the model is expected to be launched, the aim is to build a 7 billion parameter LLM to begin with, Nikhil Malhotra, Global Head-Makers Lab, Tech Mahindra, told AIM.

The model is expected to initially support 40 different Hindi dialects and more languages and dialects will be added subsequently. “We understand that much work has been done on Indic Suite, whether Bhashini, AI4 Bharat, etc., but a foundation model still needs to be developed. As we continue to develop the model, we are constantly learning and improving the process. Our interface could have voice and textual information; however, we haven’t considered incorporating a chat interface like ChatGPT yet,” Malhotra said.

The primary goal for Tech Mahindra is to first create an LLM for continuation of text and then provide a dialogue. “Once we are clear that the model performs well and generates dialects well, we would launch it in the open source.”

Benefits of building India’s biggest Indic LLM

ChatGPT, driven by OpenAI’s GPT models, has undoubtedly been groundbreaking. Hence, developing a LLM, primarily designed for Indic languages could be highly beneficial for India for a wide array of reasons. Understanding the nuances of local cultures and contexts is essential for effective communication. An Indic LLM can be designed to prioritise cultural sensitivity, ensuring that the generated content respects local customs and norms. An Indic LLM could also democratise AI and cater to the wider section of non-English speakers in the country.

“One of the benefits of a foundation model is its versatility. For instance, a language model like LLM is capable of performing multiple tasks such as Q&A, fill-in-the-blanks, etc. using the same model. This approach is beneficial for specialised healthcare, retail, and tourism industries,” Malhotra said.

Moreover, the cost of tokens is significantly higher for the Indic languages in the GPT models when compared to English. Hence, an Indic LLM offers a more cost-effective solution for generating content in Indic languages without token pricing constraints. “It represents unrepresented languages and hence helps preserve them. Being the forerunner in this space, Tech Mahindra stands to benefit from the model. In fact, techniques from the model can be leveraged to benefit our customers,” he added.

Building Indic datasets

The effectiveness of an AI model hinges on the quality of its datasets. While ample English datasets are readily accessible, there is a scarcity of datasets for Indic languages and dialects. Recognizing this challenge, various stakeholders, including the Indian government, are actively engaged in the creation of such datasets.

Last year, Indian Prime Minister Narendra Modi launched the Bhasini project, which aims to develop language translation technologies that can effectively translate content from one Indian language to another. The initiative also aims to crowd-sourcing voice datasets in multiple Indian languages to enhance the availability and accessibility of digital services in local languages.

Moreover, educational institutions such as the Indian Institute of Science (IISc) and IIT Madras (Ai4Bharat) and even Microsoft is involved in building datasets for Indic languages. Despite various efforts, in India, datasets for languages other than Hindi are scarce and incomplete. Additionally, even Hindi data is fragmented,” Malhotra said. Additionally, he confirmed that Tech Mahindra is actively in talks with different leading universities and other stakeholders for Project Indus.

Tech Mahindra is sourcing information from various platforms, including Common Crawl, newspapers, Wikipedia and YouTube descriptions. “The information on dialects is primarily available through YouTube videos or spoken language samples. We are also sourcing information commonly available on the internet from books written in specific dialects.” Besides, Malhotra has also acknowledged that the requirement for computation and its accessibility is a key challenge for the tech giant.

Banking on Bhasha Daan

For Tech Mahindra and for the success of Project Indus, the biggest challenge is gathering data for different dialects. For this, the IT giant is seeking contributions from the speakers of these dialects to help build the datasets. “For this reason, we have opened a portal to get a bhasha dan from India and Indians who speak that dialect.

By clicking “Make a Contribution” on our website, you will find a user-friendly interface with all the dialects in which we collect data. Once you select a dialect, you can listen to a sample voice recording of how Hindi is spoken in that particular dialect. Users can then scroll down and anonymously record a sentence by clicking the record button.”

Moreover, Gurnani also took to Twitter to request contributions from the general public to assist in the creation of datasets for Indic dialects. “We humbly request a bit of bhasha daan from you. Please lend us your expressions, your vocabulary, your conversations and help us train India’s biggest indigenous LLM,” he tweeted.

Mitigating biases in datasets

Oftentimes, the biases that manifest in AI models originate from biases present within the datasets themselves. Since LLMs learn from a vast amount of text data available on the internet, when not appropriately addressed, these biases can impact the outputs generated by the models.

While building datasets from scratch, Tech Mahindra must put guardrails in place to ensure this does not happen. “When we collect the data at the first phase, it is essential to realise that this data would have to go through cleaning to ensure there is no bias. To address this challenge, we would be using both human annotation and automatic techniques to ensure there is no racial bias, ethnic bias, gender bias, etc.,” Malhotra said.

While it’s a commendable move, the success of the project hinges upon various factors such as robust data collection, efficient model training, and addressing linguistic nuances.

The post [Exclusive] First Indian IT Company To Bring ‘Made in India’ ChatGPT appeared first on Analytics India Magazine.

Technical Deep Dive of Llama 2

Pretraining & Data Efficiency

Supervised Fine-Tuning (SFT) & Reinforcement Learning with Human Feedback (RLHF)

Ghost Attention: Multi-Turn Dialogues

From Meta Git Repository Using download.sh

From Hugging Face

Package Installation

Import the necessary Python libraries.

Initialize the Model and Tokenizer

Set up the Pipeline

Generate Text Sequences

A16Z's UI for LLaMa 2

Llama 2: What makes it different from GPT Models and its predecessor Llama 1?

Variety in Scale

Enhanced Context Length

Grouped Query Attention (GQA)

Performance Benchmarks

Open Source: The Power of Community

Current Challenges with Llama 2

Code Llama – Meta's Latest Launch

Conclusion

How can technology leaders power their organization’s growth agenda?

Enable growth by operating secure, rock-solid operations and customer-facing systems.

Create growth by collaborating to produce and scale new products and market channels.

Amplify growth by optimizing everything through insights, automation and artificial intelligence.

Activate partnerships to scale their ability to transform and grow.

Harness AI to rewrite the calculus of growth.

How can technology leaders themselves achieve recognition as a pillar of growth?

Subscribe to the Executive Briefing Newsletter

More On This Topic

More On This Topic

Tech Stack

Myntra’s Generative AI Prowess

Challenges Galore

Audio Script Generation

Script Parsing

Audio Generation

More On This Topic

Canva upping the AI game