‘For Us, it’s Never About Celebrating Tech for Tech’s Sake,’ says Microsoft chief Satya Nadella 

Microsoft Build and Google IO

At Microsoft Build, the company’s annual developer conference, the tech giant was quite clear about what it wanted to achieve while Google was busy playing catch-up.

“For us, it’s never about celebrating tech for tech’s sake. It’s about celebrating what we can do with technology to create magical experiences that make a real difference in our countries, in our companies, in our communities,” said Microsoft chief Satya Nadella, in his keynote speech.

Responding to an old statement of Nadella where he said that he wants “people to know that we made them [Google] dance”, Google CEO Sundar Pichai gave a cheeky response in a recent interview with Bloomberg’s Emily Chang.

“I think one of the ways you can do the wrong thing is by listening to the noise out there and playing someone else’s dance music,” laughed Pichai, extremely sure that they are indeed listening to their own music.

Ironically, Google’s I/O event looked like they were playing to OpenAI’s tune. Products such as Veo (text-to-video), Project Astra and few others seem to be a direct response to OpenAI’s products.

Build 2024

While the 2023 edition released a few Copilot plugins and a Copilot assistance for Windows, this year saw the company focus heavily on Copilot developments.

Source: Microsoft Build 2024

The new range of Copilot+ PC is not only poised to beat Apple’s MacBook Air M3, it also comes with a unique function called ‘Recall’ that allows one to find anything that they have searched for or done on their computer. How concerning this is, might be a topic for another time, but Microsoft has really gone all in to cater to enterprise and consumers alike.

But, so has Google!

AI for Everyone

At Google’s annual developer conference, Google I/O that took place last week (exactly a day after OpenAI’s Spring Update event), Pichai mentioned that despite investing in AI for over a decade at every layer of the stack, the company is still in the “early days of the AI platform shift”.

However, he then went on to release a string of AI products, which would cater to everyone, “not just him or her”.

If Copilot was Microsoft’s shield, Gemini was Google’s.

The company’s integration of Gemini into their existing suite of products continued in this event. The company even launched Gemini 1.5 Flash, a lighter version of Gemini 1.5 Pro, and Gemma 2, the next version of Google’s open models.

Small models seems to be the fad this tech season with Microsoft also releasing Phi-3-vision, which is the new multimodal small language model from the Phi-3 family. The model is set to be cost-effective and optimised for personal devices.

The small language model is in line with Microsoft’s AI tech prediction for the year where SLMs have been touted to gain traction. Abu Dhabi’s Technology Innovation Institute also recently launched their small model, Falcon-2 11B.

AI Agents are the Way

Source: X

“Soon you’ll be able to mix and match inputs and outputs. This is what we mean when we say it’s an I/O for a new generation, and I can see you all out there thinking about the possibilities. But what if we could go even further? That’s one of the opportunities we see with AI agents,” said Pichai.

Project Astra, defined as a universal AI agent helpful in everyday life, can process multimodal information, and respond naturally in conversation. Interestingly, at the OpenAI Spring Update, the company released GPT-4o, which pretty much does the same.

Microsoft was not far behind on its AI agent agenda either. The company announced its partnership with Cognition AI, the makers of the autonomous software AI agent Devin that caused quite a storm when it was released a few months ago. Devin will be powered by Azure.

However, Cognition AI was not the only major partnership announced.

Together, We Stand Strong

Microsoft announced a number of strategic alliances with partners from various industries. On the education front, Microsoft announced its partnership with Khan Academy, where the tech giant will provide free access to Khan Academy’s AI-powered teaching assistant for all K-12 educators. This will be supported by Azure OpenAI service.

Unsurprisingly, Khan Academy also demonstrated its ChatGPT and AI-powered tool usage via demo videos during OpenAI’s event.

Like every big-tech event, Build 2024 made a special mention of NVIDIA too. The company is planning to roll out a range of RTX-powered Copilot+ PCs. “We’re bringing the latest H200s to Azure later this year, and will be among the first cloud providers to offer NVIDIA’s Blackwell GPUs in B100 as well as GB200 configurations,” said Pichai.

Source: Microsoft Build 2024

While NVIDIA is for everyone, the biggest trump card for Microsoft obviously is OpenAI, and Altman made his brief appearance here, despite skipping his OpenAI event. Though he did not reveal anything new, he indirectly hinted at the next version of GPT and mentioned the obvious that “models are getting smarter”.

While both Microsoft Build and Google I/O saw the release of new products, Google’s approach seemed to take on OpenAI alone. On the other hand, Microsoft’s event showcased a change in the company’s outlook emphasising a futuristic vision with strategic partnerships.

“Microsoft is doing one thing that other cloud providers are either too scared or too complacent to try. They’re willing to cannibalise old products for the sake of a true AI-first strategy,” said AI advisor and entrepreneur, Allie K Miller.

Compared to Microsoft’s array of products, Google’s future strategy seems obscure.

The post ‘For Us, it’s Never About Celebrating Tech for Tech’s Sake,’ says Microsoft chief Satya Nadella appeared first on AIM.

Quantization and LLMs: Condensing Models to Manageable Sizes

Quantization and LLMs: Condensing Models to Manageable Sizes

The Scale and Complexity of LLMs

The incredible abilities of LLMs are powered by their vast neural networks which are made up of billions of parameters. These parameters are the result of training on extensive text corpora and are fine-tuned to make the models as accurate and versatile as possible. This level of complexity requires significant computational power for processing and storage.

Quantization and LLMs: Condensing Models to Manageable Sizes

The accompanying bar graph delineates the number of parameters across different scales of language models. As we move from smaller to larger models, we witness a significant increase in the number of parameters with 'Small' language models at the modest millions of parameters and 'Large' models with tens of billions of parameters.

However, it is the GPT-4 LLM model with 175 billion parameters that dwarfs other models’ parameter size. Not only is GPT-4 using the most parameters out of the graphs, but it also powers the most recognizable generative AI model, ChatGPT. This towering presence on the graph is representative of other LLMs of its class, displaying the requirements needed to power the future’s AI chatbots, as well as the processing power required to support such advanced AI systems.

The Cost of Running LLMs and Quantization

Deploying and operating complex models can get costly due to their need for either cloud computing on specialized hardware, such as high-end GPUs, AI accelerators, and continuous energy consumption. Reducing the cost by choosing an on-premises solution can save a great deal of money and increase flexibility in hardware choices and freedom to utilize the system wherever with a trade-off in maintenance and employing a skilled professional. High costs can make it challenging for small business deployments to train and power an advanced AI. Here is where quantization comes in handy.

What is Quantization?

Quantization is a technique that reduces the numerical precision of each parameter in a model, thereby decreasing its memory footprint. This is akin to compressing a high-resolution image to a lower resolution while retaining the essence and most important aspects but at a reduced data size. This approach enables the deployment of LLMs on with less hardware without substantial performance loss.

ChatGPT was trained and is deployed using thousands of NVIDIA DGX systems, millions of dollars of hardware, and tens of thousands more for infrastructure. Quantization can enable good proof-of-concept, or even fully fledged deployments with less spectacular (but still high performance) hardware.

In the sections to follow, we will dissect the concept of quantization, its methodologies, and its significance in bridging the gap between the highly resource-intensive nature of LLMs and the practicalities of everyday technology use. The transformative power of LLMs can become a staple in smaller-scale applications, offering vast benefits to a broader audience.

Basics of Quantization

Quantizing a large language model refers to the process of reducing the precision of numerical values used in the model. In the context of neural networks and deep learning models, including large language models, numerical values are typically represented as floating-point numbers with high precision (e.g., 32-bit or 16-bit floating-point format). Read more about Floating Point Precision here.

Quantization addresses this by converting these high-precision floating-point numbers into lower-precision representations, such as 16- or 8-bit integers to make the model more memory-efficient and faster during both training and inference by sacrificing precision. As a result, the training and inferencing of the model requires less storage, consumes less memory, and can be executed more quickly on hardware that supports lower-precision computations.

Types of Quantization

To add depth and complexity to the topic, it is critical to understand that quantization can be applied at various stages in the lifecycle of a model's development and deployment. Each method has its distinct advantages and trade-offs and is selected based on the specific requirements and constraints of the use case.

1. Static Quantization

Static quantization is a technique applied during the training phase of an AI model, where the weights and activations are quantized to a lower bit precision and applied to all layers. The weights and activations are quantized ahead of time and remain fixed throughout. Static quantization is great for known memory requirements of the system the model is planning to be deployed to.

  • Pros of Static Quantization
    • Simplifies deployment planning as the quantization parameters are fixed.
    • Reduces model size, making it more suitable for edge devices and real-time applications.
  • Cons of Static Quantization
    • Performance drops are predictable; so certain quantized parts may suffer more due to a broad static approach.
    • Limited adaptability for static quantization for varying input patterns and less robust update to weights.

2. Dynamic Quantization

Dynamic Quantization involves quantizing weights statically, but activations are quantized on the fly during model inference. The weights are quantized ahead of time, while the activations are quantized dynamically as data passes through the network. This means that quantization of certain parts of the model are executed on different precisions as opposed to defaulting to a fixed quantization.

  • Pros of Dynamic Quantization
    • Balances model compression and runtime efficiency without significant drop in accuracy.
    • Useful for models where activation precision is more critical than weight precision.
  • Cons of Dynamic Quantization
    • Performance improvements aren’t predictable compared to static methods (but this isn’t necessarily a bad thing).
    • Dynamic calculation means more computational overhead and longer train and inference times than the other methods, while still being lighter weight than without quantization

3. Post-Training Quantization (PTQ)

In this technique, quantization is incorporated into the training process itself. It involves analyzing the distribution of weights and activations and then mapping these values to a lower bit depth. PTQ is deployed on resource-constrained devices like edge devices and mobile phones. PTQ can be either static or dynamic.

  • Pros of PTQ
    • Can be applied directly to a pre-trained model without the need for retraining.
    • Reduces the model size and decreases memory requirements.
    • Improved inference speeds enabling faster computations during and after deployment.
  • Cons of PTQ
    • Potential loss in model accuracy due to the approximation of weights.
    • Requires careful calibration and fine tuning to mitigate quantization errors.
    • May not be optimal for all types of models, particularly those sensitive to weight precision.

4. Quantization Aware Training (QAT)

During training, the model is aware of the quantization operations that will be applied during inference and the parameters are adjusted accordingly. This allows the model to learn to handle quantization induced errors.

  • Pros of QAT
    • Tends to preserve model accuracy compared to PTQ since the model training accounts for quantization errors during training.
    • More robust for models sensitive to precision and is better at inferencing even on lower precisions.
  • Cons of QAT
    • Requires retraining the model resulting in longer training times.
    • More computationally intensive since it incorporates quantization error checking.

5. Binary Ternary Quantization

These methods quantize the weights to either two values (binary) or three values (ternary), representing the most extreme form of quantization. Weights are constrained to +1, -1 for binary, or +1, 0, -1 for ternary quantization during or after training. This would drastically reduce the number of possible quantization weight values while still being somewhat dynamic.

  • Pros of Binary Ternary Quantization
    • Maximizes model compression and inferencing speed and has minimal memory requirements.
    • Fast inferencing and quantization calculations enables usefulness on underpowered hardware.
  • Cons of Binary Ternary Quantization
    • High compression and reduced precision results in a significant drop in accuracy.
    • Not suitable for all types of tasks or datasets and struggles with complex tasks.

The Benefits & Challenges of Quantization

Before and after quantization

The quantization of Large Language Models brings forth multiple operational benefits. Primarily, it achieves a significant reduction in the memory requirements of these models. Our goal for post-quantization models is for the memory footprint to be notably smaller. Higher efficiency permits the deployment of these models on platforms with more modest memory capabilities and decreasing the processing power needed to run the models once quantized translates directly into heightened inference speeds and quicker response times that enhance user experience.

On the other hand, quantization can also introduce some loss in model accuracy since it involves approximating real numbers. The challenge is to quantize the model without significantly affecting its performance. This can be done with testing the model's precision and time of completion before and after quantization with your models to gauge effectiveness, efficiency, and accuracy.

By optimizing the balance between performance and resource consumption, quantization not only broadens the accessibility of LLMs but also contributes to more sustainable computing practices.
Original. Republished with permission.

Kevin Vu manages Exxact Corp blog and works with many of its talented authors who write about different aspects of Deep Learning.

More On This Topic

  • 7 Steps to Mastering Large Language Models (LLMs)
  • Generative AI Playground: LLMs with Camel-5b and Open LLaMA 3B on…
  • Vector Database for LLMs, Generative AI, and Deep Learning
  • 8 Free AI and LLMs Playgrounds
  • What are Vector Databases and Why Are They Important for LLMs?
  • ReAct, Reasoning and Acting augments LLMs with Tools!

AWS and GenAI Help Fractal Analytics Reduce Call Handling Time by up to 15%

Fractal Analytics, a leading AI solutions provider for Fortune 500 companies, has effectively reduced call handling time by up to 15% using its latest innovation, dubbed Knowledge Assist, on AWS.

Traditionally, data retrieval from multiple internal sources is time-consuming and often involves unstructured data, increasing the complexity of queries. With Knowledge Assist, Fractal aims to make knowledge retrieval more efficient within large enterprises.

It chose to build Knowledge Assist on AWS, leveraging Amazon Bedrock for its generative AI capabilities. In addition to it, Fractal utilised Amazon Elastic Container Service (ECS) for building connectors for Knowledge Assist and Amazon OpenSearch Service for vector/semantic search. The SaaS application layer runs on Amazon Elastic Kubernetes Service (EKS) and AWS Lambda serverless compute.

“The generative AI space is evolving rapidly. Being able to choose from various LLMs on Amazon Bedrock, which we can swiftly implement or experiment with, along with the ability to use the platform as an API without hosting concerns, helps us experiment and scale faster,” said Fractal Analytics Client Partner for Products and Accelerators Ritesh Radhakrishnan.

Knowledge Assist adheres to stringent security and privacy standards, protecting data within each client’s network through private endpoints and end-to-end encryption. Personally identifiable information is masked before storage in the analytics layer.

During a six-month pilot program, nearly 500 knowledge workers in contact centers adopted Knowledge Assist, handling hundreds of thousands of queries monthly and managing complex data from over 10,000 documents across pdf, doc, and ppt formats. The pilot showed a 10-15% reduction in average data retrieval time and a 30% call deflection rate due to self-service capabilities.

Clients reported improved customer and employee satisfaction, less supervisor involvement, and enhanced upsell opportunities due to more available time on each call. Radhakrishnan explained that customers received faster and better answers, leading to improved customer satisfaction scores (CSAT). Agents experienced less frustration as they no longer needed to search multiple systems for answers.

Knowledge Assist also enhances compliance by providing the latest information, reducing instances of customers receiving incorrect or outdated information. This leads to a higher level of first-time issue resolution.

Moving forward, Fractal plans to implement more automated LLM evaluations and generate fresh insights into calls to help clients proactively address recurring issues and reduce call volumes. This continuous innovation in AI-driven solutions underscores Fractal’s commitment to improving business outcomes through advanced technology.

Fractal has been riding the GenAI wave for a long time. The company entered the generative AI space last June by introducing Flyfish, a new all-round generative AI platform for digital sales. Then, it unveiled India’s first Indian languages-based text-to-image diffusion model Kalaido.ai. Currently, it’s also leveraging GenAI for insurance and even transforming the fashion value chain with vision intelligence.

The post AWS and GenAI Help Fractal Analytics Reduce Call Handling Time by up to 15% appeared first on AIM.

AI Seoul Summit: 4 Key Takeaways on AI Safety Standards and Regulations

The AI Seoul Summit, co-hosted by the Republic of Korea and the U.K., saw international bodies come together to discuss the global advancement of artificial intelligence.

Participants included representatives from the governments of 20 countries, the European Commission and the United Nations as well as notable academic institutes and civil groups. It was also attended by a number of AI giants, like OpenAI, Amazon, Microsoft, Meta and Google DeepMind.

The conference, which took place on May 21 and 22, followed on from the AI Safety Summit, held in Bletchley Park, Buckinghamshire, U.K. last November.

One of the key aims was to move progress towards the formation of a global set of AI safety standards and regulations. To that end, a number of key steps were taken:

  1. Tech giants committed to publishing safety frameworks for their frontier AI models.
  2. Nations agreed to form an international network of AI Safety Institutes.
  3. Nations agreed to collaborate on risk thresholds for frontier AI models that could assist in building biological and chemical weapons.
  4. The U.K. government offers up to £8.5 million in grants for research into protecting society from AI risks.

U.K. Technology Secretary Michelle Donelan said in a closing statement, “The agreements we have reached in Seoul mark the beginning of Phase Two of our AI Safety agenda, in which the world takes concrete steps to become more resilient to the risks of AI and begins a deepening of our understanding of the science that will underpin a shared approach to AI safety in the future.”

1. Tech giants committed to publishing safety frameworks for their frontier AI models

New voluntary commitments to implement best practices related to frontier AI safety have been agreed to by 16 global AI companies. Frontier AI is defined as highly capable general-purpose AI models or systems that can perform a wide variety of tasks and match or exceed the capabilities present in the most advanced models.

The undersigned companies are:

  • Amazon (USA).
  • Anthropic (USA).
  • Cohere (Canada).
  • Google (USA).
  • G42 (United Arab Emirates).
  • IBM (USA).
  • Inflection AI (USA).
  • Meta (USA).
  • Microsoft (USA).
  • Mistral AI (France).
  • Naver (South Korea).
  • OpenAI (USA).
  • Samsung Electronics (South Korea).
  • Technology Innovation Institute (United Arab Emirates).
  • xAI (USA).
  • Zhipu.ai (China).

The so-called Frontier AI Safety Commitments promise that:

  • Organisations effectively identify, assess and manage risks when developing and deploying their frontier AI models and systems.
  • Organisations are accountable for safely developing and deploying their frontier AI models and systems.
  • Organisations’ approaches to frontier AI safety are appropriately transparent to external actors, including governments.

The commitments also require these tech companies to publish safety frameworks on how they will measure the risk of the frontier models they develop. These frameworks will examine the AI’s potential for misuse, taking into account its capabilities, safeguards and deployment contexts. The companies must outline when severe risks would be “deemed intolerable” and highlight what they will do to ensure thresholds are not surpassed.

SEE: Generative AI Defined: How It Works, Benefits and Dangers

If mitigations do not keep risks within the thresholds, the undersigned companies have agreed to “not develop or deploy (the) model or system at all.” Their thresholds will be released ahead of the AI Action Summit in France, touted for February 2025.

However, critics argue that these voluntary regulations may not be hardline enough to substantially impact the business decisions of these AI giants.

“The real test will be in how well these companies follow through on their commitments and how transparent they are in their safety practices,” said Joseph Thacker, the principal AI engineer at security company AppOmni. “I didn’t see any mention of consequences, and aligning incentives is extremely important.”

Fran Bennett, the interim director of the Ada Lovelace Institute, told The Guardian, “Companies determining what is safe and what is dangerous, and voluntarily choosing what to do about that, that’s problematic.

“It’s great to be thinking about safety and establishing norms, but now you need some teeth to it: you need regulation, and you need some institutions which are able to draw the line from the perspective of the people affected, not of the companies building the things.”

2. Nations agreed to form international network of AI Safety Institutes

World leaders of 10 nations and the E.U. have agreed to collaborate on research into AI safety by forming a network of AI Safety Institutes. They each signed the Seoul Statement of Intent toward International Cooperation on AI Safety Science, which states they will foster “international cooperation and dialogue on artificial intelligence (AI) in the face of its unprecedented advancements and the impact on our economies and societies.”

The nations that signed the statement are:

  • Australia.
  • Canada.
  • European Union.
  • France.
  • Germany.
  • Italy.
  • Japan.
  • Republic of Korea.
  • Republic of Singapore.
  • United Kingdom.
  • United States of America.

Institutions that will form the network will be similar to the U.K.’s AI Safety Institute, which was launched at November’s AI Safety Summit. It has the three primary goals of evaluating existing AI systems, performing foundational AI safety research and sharing information with other national and international actors.

SEE: U.K.’s AI Safety Institute Launches Open-Source Testing Platform

The U.S. has its own AI Safety Institute, which was formally established by NIST in February 2024. It was created to work on the priority actions outlined in the AI Executive Order issued in October 2023; these actions include developing standards for the safety and security of AI systems. South Korea, France and Singapore have also formed similar research facilities in recent months.

Donelan credited the “Bletchley effect” — the formation of the U.K.’s AI Safety Institute at the AI Safety Summit — for the formation of the international network.

In April 2024, the U.K. government formally agreed to work with the U.S. in developing tests for advanced AI models, largely through sharing developments made by their respective AI Safety Institutes. The new Seoul agreement sees similar institutes being created in other nations that join the collaboration.

To promote the safe development of AI globally, the research network will:

  • Ensure interoperability between technical work and AI safety by using a risk-based approach in the design, development, deployment and use of AI.
  • Share information about models, including their limitations, capabilities, risk and any safety incidents they are involved in.
  • Share best practices on AI safety.
  • Promote socio-cultural, linguistic and gender diversity and environmental sustainability in AI development.
  • Collaborate on AI governance.

The AI Safety Institutes will have to demonstrate their progress in AI safety testing and evaluation by next year’s AI Impact Summit in France, so they can move forward with discussions around regulation.

3. The EU and 27 nations agreed to collaborate on risk thresholds for frontier AI models that could assist in building biological and chemical weapons

A number of nations have agreed to collaborate on the development of risk thresholds for frontier AI systems that could pose severe threats if misused. They will also agree on when model capabilities could pose “severe risks” without appropriate mitigations.

Such high-risk systems include those that could help bad actors access biological or chemical weapons and those with the ability to evade human oversight without human permission. An AI could potentially achieve the latter through safeguard circumvention, manipulation or autonomous replication.

The signatories will develop their proposals for risk thresholds with AI companies, civil society and academia and will discuss them at the AI Action Summit in Paris.

SEE: NIST Establishes AI Safety Consortium

The Seoul Ministerial Statement, signed by 27 nations and the E.U., ties the countries to similar commitments made by 16 AI companies that agreed to the Frontier AI Safety Commitments. China, notably, did not sign the statement despite being involved in the summit.

The nations that signed the Seoul Ministerial Statement are Australia, Canada, Chile, France, Germany, India, Indonesia, Israel, Italy, Japan, Kenya, Mexico, the Netherlands, Nigeria, New Zealand, the Philippines, Republic of Korea, Rwanda, Kingdom of Saudi Arabia, Singapore, Spain, Switzerland, Türkiye, Ukraine, United Arab Emirates, United Kingdom, United States of America and European Union.

4. The U.K. government offers up to £8.5 million in grants for research into protecting society from AI risks

Donelan announced the government will be awarding up to £8.5 million of research grants towards the study of mitigating AI risks like deepfakes and cyber attacks. Grantees will be working in the realm of so-called ‘systemic AI safety,’ which looks into understanding and intervening at the societal level in which AI systems operate rather than the systems themselves.

SEE: 5 Deepfake Scams That Threaten Enterprises

Examples of proposals eligible for a Systemic AI Safety Fast Grant might look into:

  • Curbing the proliferation of fake images and misinformation by intervening on the digital platforms that spread them.
  • Preventing AI-enabled cyber attacks on critical infrastructure, like those providing energy or healthcare.
  • Monitoring or mitigating potentially harmful secondary effects of AI systems that take autonomous actions on digital platforms, like social media bots.

Eligible projects might also cover ways that could help society to harness the benefits of AI systems and adapt to the transformations it has brought about, such as through increased productivity. Applicants must be U.K.-based but will be encouraged to collaborate with other researchers from around the world, potentially associated with international AI Safety Institutes.

The Fast Grant programme, which expects to offer around 20 grants, is being led by the U.K. AI Safety Institute, in partnership with the U.K. Research and Innovation and The Alan Turing Institute. They are specifically looking for initiatives that “offer concrete, actionable approaches to significant systemic risks from AI.” The most promising proposals will be developed into longer-term projects and may receive further funding.

U.K. Prime Minister Rishi Sunak also announced the 10 finalists of the Manchester Prize, with each team receiving £100,000 to develop their AI innovations in energy, environment or infrastructure.

RAG With Microsoft Copilot

At Microsoft Build 2024, the company announced several new tools and advancements, most notably retrieval augmented generation, or RAG for short, incorporation in the Copilot Library, making it easier to use on-device data for your applications. It provides the right tools to build a vector store within the platform and enables semantic search, similar to Recall.

The RAG architecture offers an enterprise solution by allowing one to constrain generative AI to the organisation’s content. This content can come from vectorised documents, images, and other data formats, provided you have embedding models for them.

“There’s no question that RAG is core to any AI-powered application, especially in the enterprise today. Azure AI Search makes it possible to run RAG at any scale, delivering highly accurate responses using state-of-the-art retrieval systems,” said Microsoft chief Satya Nadella, saying that ChatGPT, data assistants, are all powered by Azure AI Search today.

Further, for efficient app development, one can combine the smart, human-like responses of Azure OpenAI with MySQL’s powerful database management and Azure AI Search’s advanced capabilities.

This integration enhances CMS, e-commerce, or gaming sites on Azure Database for MySQL by incorporating generative AI search and chat using LLMs from Azure OpenAI, along with vector storage and indexing from Azure AI Search, supported by RAG.

Elevating RAG and Search: The Synergy of Azure AI Document Intelligence and Azure OpenAI https://t.co/za5fmdLqmH

— Everything Microsoft (@EverythingMS) December 14, 2023

Azure AI Search capabilities were introduced one month ago. As a proven solution for information retrieval in a Retrieval-Augmented Generation (RAG) architecture, Azure AI Search offers robust indexing and query functionalities. It leverages the infrastructure and security of the Azure cloud, ensuring reliable and secure performance.

The Relevance of RAG

RAG was introduced to address LLM hallucinations by extending the model’s capabilities to external sources, vastly widening accessible information. LLMs rely on statistical patterns without true comprehension, excelling at text generation but struggling with logical reasoning. They hallucinate because they are fixed to the training data, no matter how big the model size or how long the context length.

Additionally, RAG allows customers to add new datasets, providing fresh information for the LLM to generate accurate answers, enabling enterprises to derive insights from their own data.

Every LLM builder does this by trying to expand the size of the model, or in the case of LLMs like Perplexity, get the answers directly from the internet. This also allows them to generate current and reliable information by not relying on outdated facts. Meanwhile, RAG ensures the LLM always has the latest and most trustworthy information at its disposal.

To ensure domain-specific knowledge, Cohere and Anthropic let enterprises provide their own personal data through Oracle Cloud to expand on the internal data. These LLMs, with RAG, provide insights that are more personalised with the company’s data.

Despite people claiming RAG is obsolete, it is actually evolving and increasingly being adopted by enterprises. RAG’s versatility spans various domains, such as customer service, educational tools, and content creation. Also, the developer community is actively exploring new ways to enhance RAG, such as creating applications with Llama-3 running locally.

Additionally, RAG is no longer limited to vector database matching. Many advanced RAG techniques are being introduced that significantly improve retrieval. For instance, integrating Knowledge Graphs (KGs) into RAG leverages structured, interlinked data, enhancing the system’s reasoning capabilities.

Going Beyond RAG

RAG reduces the rate of hallucinations by ensuring that all generated responses are supported by evidence, preventing the model from speculating blindly.

Meanwhile, there are other techniques to reduce hallucinations in LLMs, including Chain-of-Verification (CoVe) by Meta AI, which reduces hallucinations in LLMs by breaking fact-checking into manageable steps. It generates an initial response, plans verification questions, answers these independently, and produces a final verified response. Likewise, several other methods to reduce hallucinations have been pioneered, enabling the creation of more robust LLM systems.

On the other hand, with the launch of GPT-4 Turbo and the Retrieval API, OpenAI had also tried its hand at fixing the hallucination problem. With a long context length and the option for enterprises to integrate new data for information, OpenAI has almost cracked and solved the most important problem of LLMs, but at the cost of the data privacy of users.

For example, with a little fancier prompt engineering, a user on X was able to download the original knowledge files from someone else’s GPTs, an app built with the recently released GPT Builder, with RAG. This poses a major security issue for the model.

With Microsoft Copilot incorporating RAG into its library, it may also experiment with other ways to reduce hallucinations and better serve enterprises and their customers. But just like OpenAI, this has to be taken with a pinch of salt, especially in terms of privacy risks.

The post RAG With Microsoft Copilot appeared first on AIM.

LLM Handbook: Strategies and Techniques for Practitioners

LLM Handbook: Strategies and Techniques for Practitioners
Image by Author

Large Language Models (LLMs) have revolutionized the way machines interact with humans. They are a sub-category of Generative AI, with a focus on text-based applications, while Generative AI is much broader including text, audio, video, images, and even, code!

AWS summarizes it well – “Generative artificial intelligence (generative AI) is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. It reuses training data to solve new problems.”

Generative AI has opened up new frontiers in the AI landscape!

LLMs come with their ability to generate human-like responses, but how should AI practitioners use them? Is there a guide or an approach to help the industry build confidence with this cutting-edge technology?

That’s precisely what we will discuss in this article. So, let’s get started.

An assistant to get started !!!

LLMs are essentially generators, so it is advised to use them for purposes, such as generating summaries and providing explanations, and answers to a wide range of questions. Typically, AI is used to assist human experts. Similarly, LLMs can augment your understanding of complex topics.

Industry experts consider LLMs as good sounding boards – yes, they are good for asking validation questions, brainstorming ideas, creating drafts, or even checking whether there is a better way to articulate the existing content. Such recommendations provide developers and AI enthusiasts the playground to test this powerful technology.

Not just text, LLMs help generate and debug code, as well as explain complex algorithms in an easy-to-understand manner, highlighting their role in demystifying the jargon to provide a tailor-made conceptual understanding for different personas.

Benefits!!

Now, let’s discuss some of the cases underscoring the role of LLMs in bringing efficiencies. The examples below focus on generating reports and insights, and simplifying business processes.

Collaboration Tools: Creating summary reports of data shared across applications such as Slack, is a very effective way to stay informed about projects’ progress. It can include details like the topic, its current status, the development thus far, the participants, action items, due dates, bottleneck, next steps, etc.

Role of LLMs in bringing efficiencies

Image by Author

Supply Chain: The supply chain planners are mostly in a fire-fighting situation to meet the demand orders. While supply chain planning helps a lot, the last mile delivery requires experts to come together in the war room to keep the supply chain plan intact. A lot of information, often in the form of text gets exchanged, including insights that are helpful for future purposes too. Plus, the summary of such conversations keeps all the stakeholders informed of the real-time status.

Adopting LLMs

With rapidly evolving advancements in technology, it is crucial to not give under the fear of missing out, but instead approach with the business-first mindset.

In addition to suggestions proposed above, the users must keep themselves updated and regularly check for new techniques, and best practices to ensure the effective use of these models.

Separate Facts from Fiction

Having discussed the benefits of LLMs, it is time to understand the other side. We all know there is no free lunch. So, what does it require to make responsible use of LLMs? There are a lot of concerns like model bias, potential misuse such as deepfakes, and their repercussions, requiring increased awareness of the ethical implications of LLMs.

Segregate human-generated responses from machine response.

Image by Author

The situation has worsened to the extent that it has become increasingly difficult to segregate human-generated responses from that of a machine.

So, it is advised to not consider the information from such tools at face value, instead, consider these tips:

  • Refer to models as efficiency-enhancing tools and not as a single point of truth.
  • Crowdsource information from multiple sources and cross-check it before taking action – the ensemble works great by bringing together different viewpoints.
  • While you consider the importance and the trust factor of information coming from multiple sources, always check the source of information and the citations, preferably the ones with a higher reputation.
  • Do not assume the given information is true. Look for contrarian views, i.e. what if this were wrong? Gather evidence that helps you refute that information is incorrect, rather than trying to support its validity.
  • The model response often has gaps in its reasoning, read well, question its relevancy, and nudge it to get to the appropriate response

Tips to Consider while Prototyping LLMs

Let’s get straight to the practical applications of LLMs to know their capabilities as well as limitations. To start with, be prepared for multiple experiments and iteration cycles. Always stay informed about the latest industry developments to get the maximum benefits of the models.

The golden rule is to start from business objectives and set clear goals and metrics. Quite often, the performance metrics include multiple goals in terms of not just accuracy, but also speed, computational resources, and cost-effectiveness. These are the non-negotiables that must be decided beforehand.

The next important step is to choose the right LLM tool or platform that suits the business needs, which also includes the consideration of the closed or open source model.

Helpful tips to make most of LLMs capability

Image by Author

The size of the LLMs is another key deciding factor. Does your use-case demand a large model or small approximator models, which are less hungry on compute requirements, make a good trade-off for the accuracy they provide? Note that the larger models provide improved performance at the cost of consuming more computational resources, and in turn the budget.

Given the security and privacy risks that come with the large models, businesses need robust guardrails to ensure their end users' data is safe. It is equally important to understand the prompting techniques to convey the query and get the information from the model.

These prompting techniques are refined over time with repeated experiments, such as by specifying the length, tone, or style of the response, to ensure the response is accurate, relevant, and complete.

Summary

LLM is, indeed, a powerful tool for an array of tasks, including summarizing information to explaining complex concepts and data. However, successful implementation requires a business-first mindset to avoid getting into AI hype and find a real valid end-use. Furthermore, awareness of ethical implications such as verifying information, questioning the validity of responses, and being cognizant of potential biases and risks associated with LLM-generated content promotes responsible utilization of these models.

Vidhi Chugh is an AI strategist and a digital transformation leader working at the intersection of product, sciences, and engineering to build scalable machine learning systems. She is an award-winning innovation leader, an author, and an international speaker. She is on a mission to democratize machine learning and break the jargon for everyone to be a part of this transformation.

More On This Topic

  • 3 Research-Driven Advanced Prompting Techniques for LLM Efficiency…
  • Some Kick Ass Prompt Engineering Techniques to Boost our LLM Models
  • Web LLM: Bring LLM Chatbots to the Browser
  • How SAS can help catapult practitioners' careers
  • 5 Linguistics Courses for NLP Practitioners
  • Top AI and Data Science Tools and Techniques for 2022 and Beyond

Google DeepMind Introduces Semantica, An Adaptable Image-Conditioned Diffusion Model

Researchers at Google DeepMind introduced Semantica, an image-conditioned diffusion model capable of generating images based on the semantics of a conditioning image.

The paper explores adapting image generative models to different datasets. Instead of finetuning each model, which is impractical for large-scale models, Semantica uses in-context learning.

It is trained on web-scale image pairs, where one random image from a webpage is used to condition the generation of another image from the same page, assuming these images share semantic traits.

Semantica leverages pre-trained image encoders and semantic-based data filtering to achieve high-quality image generation without the need for fine-tuning on specific datasets. Its architecture enables it to generate new images from any dataset by simply using images from that dataset as input, making it highly adaptable.

Source: Research Paper

This flexibility is essential for practical uses, as it allows the model to work with a wide range of dynamic image sources without the need for extensive retraining.

By using diffusion models, which iteratively refine an image from a noise vector, Semantica achieves a balance between computational efficiency and output quality. The approach allows for scalable and flexible image generation, which is valuable for various real-world uses such as content creation, image editing, and virtual reality environments.

Semantica can be useful in various domains. For instance, in creative industries, the model can be used to generate artwork or design elements based on a given theme or style. In education, it can create illustrative content tailored to specific topics, enhancing the learning experience. Additionally, in e-commerce, Semantica can generate product images that match the aesthetic preferences of different customer segments, potentially boosting engagement and sales.

The researchers conducted extensive experiments to evaluate Semantica’s performance across different datasets and found that the model effectively captures the semantic essence of the conditioning images, producing results that are visually coherent and contextually relevant.

Researchers at Google DeepMind have been doing some exciting work lately. Recently, they also introduced CAT3D, a new method for creating 3D scenes in as little as one minute. Instead of needing hundreds of photos, CAT3D uses a few images to generate new, consistent views of a scene. These views help create detailed 3D models that can be viewed from any angle in real-time.

Google DeepMind, in collaboration with its subsidiary Isomorphic Labs, also unveiled AlphaFold 3, a new AI model capable of predicting the structure and interactions of all biological molecules, including proteins, DNA, RNA, and ligands. AlphaFold 3 is the first AI system to surpass physics-based tools for biomolecular structure prediction.

The post Google DeepMind Introduces Semantica, An Adaptable Image-Conditioned Diffusion Model appeared first on AIM.

Society Separates Those Who are Good at Maths and Those Who Are Not, But AI Doesn’t

Society Separates Those Who are Good at Maths and Those Who Are Not, But AI Doesn’t

During an episode of the Logan Bartlett Show, Sam Altman recalled how calculators were perceived in his maths classes. “We never got to use calculators,” he said, adding that conversely you had to be proficient with calculators in real-life to excel later.

“If OpenAI researchers never got to use calculators, OpenAI wouldn’t have happened,” he said, explaining that we now need to teach people how to use AI because it’s going to be an important part of what we do in the future.

The importance of learning maths has been emphasised from our school days. It’s a crucial requirement in order to excel as an engineer. Further, with computer science becoming so mainstream, society has started to separate those who are good at maths from those who are not.

Now AI is bringing down this wall for the better. With the advent of tools such as ChatGPT and Copilot, everyone is increasingly becoming a developer without needing to learn maths, democratising access to fields that once required deep knowledge of the subject.

Some say mathematics and AI are two branches of the same tree. Others believe that mathematics forms the backbone of AI. But ever since we’ve built softwares that can assist programmers without the need for even a tiny bit of mathematics, AI/ML has become a field widely accessible to everyone as the mathematical barrier to entry is being done away with.

The Experts Don’t Necessarily Agree

“I’ve often said ‘don’t worry about it’ when it comes to maths, because maths shouldn’t hold anyone back from making progress in ML. And, understanding some key topics in linear algebra, calculus, and probability and statistics will help you get learning algorithms to work better,” said Andrew Ng, the AI guru. Interestingly, this was while he was introducing a course on maths for ML and data science.

NYU associate professor Julian Togelius said that you can indeed be successful in CS, including in machine learning, while knowing next to nothing about maths. “Just look at me, I barely passed those required theory courses, still made it here,” he said.

Meanwhile, Harvard University professor of computer science Boaz Barak said last year, “I teach computer science, and I apologise lesser now about having so much maths.” The same is the case with The Math(s) Fix author Conrad Wolfram. Wolfram said that with the era of ChatGPT, being bad at maths is as big a problem of the student, as it is of the subject itself — it has become stagnant.

On the other hand, some ML engineers have never stumbled upon the usage of maths in their lives. “Pure maths research is not typically published at top ML conferences,” a Reddit user pointed out. “I have spent way more time installing CUDA drivers than proving theorems,” said another Reddit user in a thread talking about how much maths is involved on a daily basis in ML engineering.

Sure, the barrier to entry for a computer newbie to enter into the ML field has drastically decreased. A lot of ML fields are just about deploying models, and a lot of new models like ChatGPT or Codex even write the code for you. Knowing the maths behind all of this is something we don’t even think about anymore.

But funnily enough, AI is not yet good at maths, though its capabilities are increasing. Recently, ChatGPT with the Wolfram plug-in scored a 96% in the UK A-level paper for maths, which is an essential qualification to get into the AI field.

What this tells us is that if AI is able to crack an exam that is meant to get into AI, there needs to be a major change in the educational systems across the world to adjust to the shifting paradigm of mathematical teaching.

Maths is Like Law, and the Divide Will Continue

Mathematics, at its core, is the embodiment of logic. AI is deeply rooted in mathematical principles. From its inception, AI has relied on mathematical concepts to create models that mimic human cognition and decision-making processes. Understanding the relevance of mathematics in AI requires acknowledging that AI itself is an applied manifestation of mathematical logic.

Does society need mathematics? Absolutely. However, in today’s world, the accessibility of AI tools means that even those who are not mathematically inclined can perform complex calculations and data analysis with ease.

Microsoft has taken this a step further when it comes to reducing the barrier of entry for experts not well-versed with maths to enter into the field. At the Microsoft Build conference this year, CEO Satya Nadella announced that now everyone can code in their native language with Copilot Workspace.

Adding to that is the fact that it is increasingly becoming the case that you do not need a degree to get an AI job.

Experienced tech leader and consultant Oskar Ojala elaborated on the practical application of maths in solving real life problems while giving an example of the success of Facebook. Disagreeing with Ojala, nbn Australia research engineer Alex Eisenmann said that CS without maths could give you Facebook, but CS with maths has the potential to provide frameworks like AI, ML, quantum computing, and blockchain.

The post Society Separates Those Who are Good at Maths and Those Who Are Not, But AI Doesn’t appeared first on AIM.

5 Free AI Playgrounds For You to Try in 2024

5 Free AI Playgrounds For You to Try in 2024 blog feature image
Image by Author

Do you want to try out the latest large language models (LLMs) that have just been released? Or do you want to be the first to explore cutting-edge open-source and discuss them with your peers? It is a thrilling time for AI enthusiasts as several platforms offer free access to state-of-the-art models for everyone to try out and compare. So, get ready to dive into the world of AI playgrounds and explore the potential of these newly released AI models that are changing the world.

In this blog, I will share a list of 5 user-friendly, fast, interactive AI playgrounds that provide custom models and are free to use. Some of the platforms even offer free access to proprietary models.

1. HuggingChat

HuggingChat is my favorite, even though I have ChatGPT Pro. I use HuggingChat daily due to its user interface, fast response generation, and ability to switch between the models.

HuggingChat UI
Image from HuggingChat

Hugging Face offers its users the most advanced open-source models, and they discontinue the older, less efficient models. Therefore, you can be confident that you will receive the best AI experience for code debugging, generating content, learning new concepts, and solving problems.

2. Poe

Poe is my second favorite platform, as it has a more extensive repository of large language models. It is fast, and the user interface is interactive and easy to navigate. The key feature of the Poe AI playground is that it lets you try all of the top-of-the-life open-source and closed-source models. In short, you just need to bookmark Poe and get an all-in-one AI experience.

Poe user interfeace
Image from Poe

Poe also offers the option to create your own customizable AI chatbot, or you can explore the public library's thousands of chatbots. These chatbots are customized using the system prompt, model type, and knowledge source.

3. Chat LMSys

Chat LMSys is known for its chatbot arena leaderboard, but it can also be used as a chatbot and AI playground. It provides access to 40 state-of-the-art AI models, both open-source and proprietary, and you can compare their results.

Chat LMSys UI
Image from Chat LMSys

Three drawbacks keep me from using it daily:

  • The bad user interface.
  • Website loading time.
  • Lack of availability of models in the direct chat.

You have to participate in Area Battel to access the preparatory models.

4. AI SDK

AI SDK is another simple and fast AI playground. It also provides access to the top open-source and close-source models. To access state-of-the-art models like GPT-4-turbo, you might have to subscribe to Vercel Pro. But in my opinion, some of the free models available here are not freely available on the other platforms. So, it would help if you used it with the combo of Poe or Chat LMSys.

AI SDK by Vercel AI Chat UI
Image from AI SDK by Vercel AI

AI SDK requires no sign-in to use, and you can compare multiple models at the same time. It is fast and provides additional options to modify and improve the model response. Also, you can sync the prompt or use each model for a different prompt.

5. Workers AI

Workers AI became known to me recently. It is fast and simple and provides access to open-source AI models. What is special about this platform is that you can add multiple inputs (users & assistants) to create a history or context for the LLM to understand and respond appropriately. Apart from that, it is fast in loading and does not require any signups.

Workers AI Chat UI
Image from Workers AI

I kept it at the bottom because it is simple, lacks core features, does not have all of the top AI models, and, most of all, there is no way you can adjust model parameters to improve the response.

Conclusion

If you want to access all of the AI models and experience magic firsthand, I suggest you look at the Hugging Face Spaces page. Every day, there is something new and exciting to try to impress others on social media. You can find free and open image generation, speech generation, LLMs, and multimodal models.

In this blog, we learn about 5 AI playgrounds that you should use in 2024. They will help you access the top-of-the-line LLMs for free; some do not even require signups.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in technology management and a bachelor's degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

More On This Topic

  • 5 Amazing & Free LLMs Playgrounds You Need to Try in 2023
  • The 5 Best Vector Databases You Must Try in 2024
  • 7 End-to-End MLOps Platforms You Must Try in 2024
  • 8 Free AI and LLMs Playgrounds
  • Top 5 AI Coding Assistants You Must Try
  • 5 Must Try Awesome Python Data Visualization Libraries

Cohere Releases Aya 23 Multilingual Models Including Hindi with 8 Bn and 35 Bn Parameters

Cohere Releases Aya 23 Multilingual Models Including Hindi with 8 Bn Parameters

Cohere For AI has announced the launch of Aya 23, a family of generative large language models (LLMs) featuring open weights for both 8-billion and 35-billion parameter versions.

Covering 23 languages which includes Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese, Aya 23 aims to advance multilingual AI research significantly.

Click here to check out the model on Hugging Face.

Building on the success of the Aya initiative, which brought together 3,000 global collaborators to create the largest multilingual instruction fine-tuning dataset, Aya 23 shifts focus from breadth to depth. While Aya 101 spanned 101 languages, Aya 23 pairs a highly performant pre-trained model with the Aya dataset collection to deliver robust performance across 23 languages, reaching nearly half of the global population.

In benchmarking, the 35B parameter Aya 23 outperformed other massively multilingual open-source models and widely used open-weight instruction-tuned models, achieving top results across all covered languages.

The 8B parameter version also demonstrated best-in-class multilingual performance while maintaining efficiency and accessibility for developers, emphasising Cohere For AI’s dedication to democratising access to advanced technology.

Aya 23 is now available for experimentation, exploration, and foundational research, including safety auditing, on Hugging Face.

The post Cohere Releases Aya 23 Multilingual Models Including Hindi with 8 Bn and 35 Bn Parameters appeared first on AIM.