5 Tools to Help Build Your LLM Apps

5 Tools to Help Build Your LLM Apps
Image generated with DALLE-3

In the era of advanced language model applications, developers and data scientists are continuously seeking efficient tools to build, deploy, and manage their projects. As large language models (LLMs) like GPT-4 gain popularity, more people are looking to leverage these powerful models in their own applications. However, working with LLMs can be complex without the right tools.

That's why I've put together this list of five essential tools that can significantly enhance the development and deployment of LLM-powered applications. Whether you're just beginning or are a seasoned ML engineer, these tools will help you be more productive and build higher-quality LLM projects.

1. Hugging Face

Hugging Face is more than just an AI platform; it's a comprehensive ecosystem for hosting models, datasets, and demos. It supports various frameworks allowing users to train, fine-tune, evaluate, and generate content in multiple forms like images, text, and audio. The combination of a vast model selection, community resources, and developer-friendly APIs in one platform is why Hugging Face has become a go-to destination for many AI practitioners and ML engineers.

Learn how to fine-tune the Mistral AI 7B LLM using Hugging Face AutoTrain and push the model to Hugging Face Hub.

2. LangChain

LangChain is a tool that uses a composability approach to build applications with LLMs. It is widely used to develop context-aware applications by integrating different sources of context with language models. Additionally, it can use a language model to reason about actions or responses based on the context provided. The LangChain AI team has recently introduced LangSmith, a new tool that provides a unified development platform to increase the speed and efficiency of LLM application production.

If you're new to AI development, check out LangChain's cheat sheet to understand Python API and other functionalities.

3. Qdrant

Qdrant is a Rust-based vector similarity search engine and database that provides a production-ready service with a simple API. It is tailored for extended filtering support, making it ideal for applications that use neural-network or semantic-based matching. Qdrant's speed and reliability under high load make it a top choice for turning embeddings or neural network encoders into comprehensive applications for matching, searching, recommending, and more. You can also try a fully managed Qdrant Cloud service, including a free tier, available for ease of use.

Read the 5 Best Vector Databases You Must Try in 2024 to learn about other alternatives to Qdrant.

4. MLflow

MLflow now includes support for LLMs, offering experiment tracking, evaluation, and deployment solutions. It simplifies the integration of LLM capabilities into applications by introducing features like the MLflow Deployments Server for LLMs, LLM Evaluation, and Prompt Engineering UI. These tools help in navigating the complex landscape of LLMs, comparing foundational models, providers, and prompts to find the best fit for your project.

Check out the list of 5 Free Courses to Master MLOps.

5. vLLM

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Known for its state-of-the-art serving throughput and efficient attention key and value memory management, vLLM offers features like continuous batching, optimized CUDA kernels, and support for NVIDIA CUDA and AMD ROCm. Its flexibility and ease of use, including integration with popular Hugging Face models and various decoding algorithms, make it a valuable tool for LLM inference and serving.

Conclusion

Each of these five tools brings unique strengths to the table, whether it's in hosting, context awareness, search capabilities, deployment, or efficiency in inference. By leveraging these tools, developers and data scientists can significantly streamline their workflows and elevate the quality of their LLM applications.

Gain inspiration and build 5 Projects with Generative AI Models and Open Source Tools.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

More On This Topic

  • Python Vector Databases and Vector Indexes: Architecting LLM Apps
  • Web LLM: Bring LLM Chatbots to the Browser
  • How to Create Stunning Web Apps for your Data Science Projects
  • How to Build Data Frameworks with Open Source Tools to Enhance…
  • Meet MetaGPT: The ChatGPT-Powered AI Assistant That Turns Text Into…
  • Windows on Snapdragon Brings Hybrid AI to Apps at the Edge

Reliance Forms JV with Canada’s Brookfield Infrastructure for Indian Data Centres

Digital Connexion, formerly BAM Digital Realty, has completed a three-way joint venture between Brookfield Infrastructure, Reliance Industries, and Digital Realty to establish advanced data centres in India, bolstering the nation’s digital infrastructure.

Reliance Industries plans to invest up to $122.24 million alongside Canada’s Brookfield Infrastructure in constructing data centres in India.

Initially announced on July 24, 2023, this joint venture now sees each entity owning an equal one-third (33.33%) stake in the venture. Completing this transaction heralds the rebranding of the joint venture as Digital Connexion, aiming to build on the strong foundations laid by BAM Digital Realty.

This collaboration capitalises on the strengths of each partner: Brookfield Infrastructure’s understanding of global and Indian infrastructure, Jio’s digital ecosystem and enterprise relationships, and Digital Realty’s expertise in data centres. Their combined knowledge and experience position Digital Connexion strategically.

The joint venture progresses the development of data centres in Chennai and Mumbai. MAA10, the flagship greenfield data centre in Chennai, set to launch in January 2024, boasts 20 megawatts (MW) of IT load capacity. The recent land acquisition in Mumbai also signals plans for a 40 MW data centre. These centres aim to serve as connectivity hubs for Indian enterprises and global corporations entering the Indian market.

This move aligns with the significant growth anticipated in India’s data centre market, driven by increased digital services accessibility, 5G integration, and the adoption of IoT and Generative AI. The Indian data centre market is projected to expand by 40% annually and attract $5 billion in investments by 2025, according to a report by Avendus Capital. The joint venture aims to address these trends, catering to local and global enterprises, startups, and cloud computing needs.

The Indian data centre sector is also intensifying with Reliance’s recent entry, coinciding with Adani Enterprises’ joint venture, which secured $213 million to support ongoing data centre construction activities.

The industry’s expansion aligns with projections of a 40% annual growth and an influx of $5 billion in investments by 2025, as per Avendus Capital’s report. Reliance and Adani Enterprises’ entries further amplify this sector’s growth, reflected in a recent $213 million funding secured for ongoing data centre construction.

CB Velayuthan, CEO of Digital Connexion, remarked, “This transaction fortifies India’s digital infrastructure, leveraging market insights, energy-efficient data centre design, and a robust digital ecosystem.”

The post Reliance Forms JV with Canada’s Brookfield Infrastructure for Indian Data Centres appeared first on Analytics India Magazine.

Evolution in ETL: How Skipping Transformation Enhances Data Management

Evolution in ETL: How Skipping Transformation Enhances Data Management
Image by Editor

Few data concepts are more polarizing than ETL (extract-transform-load), the preparation technique that has dominated enterprise operations for several decades. Developed in the 1970s, ETL shined during an era of large-scale data warehouses and repositories. Enterprise data teams centralized data, layered reporting systems and data science models on top, and enabled self-service access to business intelligence (BI) tools. However, ETL has shown its age in an era of cloud services, data models, and digital processes.

Searches such as “Is ETL still relevant/in-demand/obsolete/dead?” populate results on Google. The reason why is that enterprise data teams are groaning under the weight of preparing data for widespread use across employee roles and business functions. ETL doesn’t scale easily to handle vast volumes of historical data stored in the cloud. Nor does it deliver real-time data required for rapid executive decision-making. In addition, building custom APIs to provide applications with data creates significant management complexity. It’s not uncommon for modern enterprises to have 500 to 1,000 pipelines in place as they seek to transform data and equip users with self-service access to BI tools. However, these APIs are in a constant state of evolution as they must be reprogrammed when the data that they pull changes. It’s clear this process is too brittle for many modern data requirements, such as edge use cases.

In addition, application capabilities have evolved. Source systems provide business logic and tools to enforce data quality while consuming applications enable data transformation and provide a robust semantic layer. So, teams are less incentivized to build point-to-point interfaces to move data at scale, transform it, and load it into the data warehouse.

Two innovative techniques point the way to enabling data democratization while minimizing transformation burdens. Zero ETL makes data available without moving it, whereas reverse ETL pushes rather than pulls data to the applications that need it as soon as it is available.

Zero ETL Reduces Data Movement and Transformation Requirements

Zero ETL optimizes the movement of smaller data sets. With data replication, data is moved to the cloud in its current state for use with data queries or experiments.

But what if teams don’t want to move data at all?

Data virtualization abstracts servers from end users. When users query data from a single source, that output is pushed back to them. And with query federation, users can query multiple data sources. The tool combines results and presents the user with integrated data results.

These techniques are called zero ETL because there is no need to build a pipeline or transform data. Users handle data quality and aggregation needs on the fly.

Zero ETL is ideally suited for ad-hoc analysis of near-term data, as executing large queries on historical data can harm operational performance and increase data storage costs. For example, many retail and consumer packaged goods executives use zero ETL to query daily transactional data to focus marketing and sales strategies during times of peak demand, such as the holidays.

Google Cortex provides accelerators, enabling zero ETL on SAP enterprise resource planning system data. Other companies, such as one of the world’s largest retailers and a global food and beverage company, have also adopted zero ETL processes.

Zero ETL gains include:

  • Providing speed to access: Using zero ETL processes to provision data for self-service queries saves 40-50% of the time it takes using traditional ETL processes since there’s no need to build pipelines.
  • Reducing data storage requirements: Data does not move with data virtualization or query federation. Users only store query results, decreasing storage requirements.
  • Delivering cost savings: Teams that use zero ETL processes save 30-40% on data preparation and storage costs compared to traditional ETL.
  • Improving data performance: Since users query only the data they want, results are delivered 25% faster.

To get started with zero ETL, teams should evaluate which use cases are best suited for this technique and identify the data elements they need to execute it. They also should configure their zero ETL tool to point to the desired data sources. Teams then extract data, create data assets, and expose them to downstream users.

Using Reverse ETL to Feed Applications with Data On-Demand

Reverse ETL techniques simplify data flows to downstream applications. Instead of using REST APIs or endpoints and writing scripts to pull data, teams leverage reverse ETL tools to push data into business processes on time and in full.

Using reverse ETL provides the following benefits:

  • Reducing time and effort: Using reverse ETL for key use cases reduces the time and effort to access data for key use cases by 20-25%. A leading cruise line leverages reverse ETL for digital marketing initiatives.
  • Improving data availability: Teams have greater certainty they’ll have access to the data they need for key initiatives, as 90-95% of target data is delivered on time.
  • Decreasing costs: Reverse ETL processes reduce the need for APIs, which require specialized programming skills and increase management complexity. As a result, teams reduce data costs by 20-25%.

To get started with reverse ETL, data teams should evaluate use cases that require on-demand data. Next, they determine the frequency and volume of data to be delivered and choose the proper tooling to handle these data volumes. Then, they point data assets in the data warehouse to their destination consumption systems. Teams should prototype with one data load to measure efficiency and scale processes.

To Succeed with Data, Use a Variety of Preparation Techniques

Zero ETL and reverse ETL tools provide teams with fresh options for serving data to users and applications. They can analyze factors such as use case requirements, data volumes, delivery timeframes, and cost drivers to select the best option for delivering data, whether traditional ETL, zero ETL, or reverse ETL.

Partners support these efforts by providing insight into the best techniques and tools to meet functional and non-functional requirements, providing a weighted scorecard, conducting a proof of value (POV) with the winning tool, and then operationalizing the tool for more use cases.

With zero ETL and reverse ETL, data teams achieve their goals of empowering users and applications with the data they need where and when they need it, driving cost and performance gains while avoiding transformation headaches.

Arnab Senis an experienced professional with a career spanning over 16 years in the technology and decision science industry. He presently serves as the VP-Data Engineering at Tredence, a prominent data analytics company, where he helps organizations design their AI-ML/Cloud/Big-data strategies. With his expertise in data monetization, Arnab uncovers the latent potential of data to drive business transformations across B2B & B2C clients from diverse industries. Arnab's passion for team building and ability to scale people, processes, and skill sets have helped him successfully manage multi-million-dollar portfolios across various verticals, including Telecom, Retail, and BFSI. He has previously held positions at Mu Sigma and IGate, where he played a crucial role in solving clients’ problems by developing innovative solutions. Arnab's exceptional leadership skills and profound domain knowledge have earned him a seat on the Forbes Tech Council.

More On This Topic

  • How Cloud Computing Enhances Data Science Workflows
  • Data Transformation: Standardization vs Normalization
  • Essential Math for Data Science: Linear Transformation with Matrices
  • dbt for Data Transformation — Hands-on Tutorial
  • The Chatbot Transformation: From Failure to the Future
  • ETL in the Cloud: Transforming Big Data Analytics with Data…

Google Chrome will soon let users build custom AI-generated themes, with more options than on Pixel 8

Google Chrome logo on phone

The introduction of custom AI-generated wallpapers was a big development on the Google Pixel 8 and Pixel 8 Pro, and now Google is giving a little AI love to its Chrome browser.

It was just a few days ago that Google announced a "Help Me Write" AI feature was headed to Chrome. But now it appears Chrome users will soon have the ability to create a custom AI-generated theme for their browser. The feature was first spotted by X, formerly Twitter, user Leopeva64, who dove deep into the code of the latest unreleased Canary version of the browser.

Also: Why Google's cheaper Pixel 8 is the real star of its Android phone lineup

Like on the Pixel 8, the feature starts off by asking the user to choose a theme. But the themes are quite a bit more robust than what's offered for Google's flagship phones. Under the subjects tab, the X post shows, there are categories like Buildings, Food, Everyday Objects, Nature, Space, US Cities and Parks, and more.

Those categories expand into further options to choose from. Buildings, for example, breaks down into Airport, Cafe, Castle, Lighthouse, Office, and so on. Everyday Objects shows dozens of household objects that a theme can be built around. Under Space, you can build a theme around Constellations, Satellites, Moon, Sun, Stars, Solar system, Spaceships, and more.

US cities is the category I'm most excited to see. A glance shows options for Arches National Park, Chicago, the Grand Canyon, Houston, Los Angeles, New York City, Philadelphia, Phoenix, San Diego, San Francisco, Seattle, and the Everglades among others.

Also: Google's Gemini continues the dangerous obfuscation of AI technology

Once that theme is selected, the user can even further fine-tune their theme with color and mood options — say, a steampunk sad Chicago in blue hues or an expressionist romantic airport with red tones.

Since the feature isn't actually live yet, we don't have an idea of what the wallpapers might look like. But I found myself creating dozens of wallpapers with the Pixel 8 Pro, and it appears Chrome's version will only be better.

Given how long things usually take from first appearing in Chrome's code to actually being deployed for use, it seems likely we'll see a full rollout of this feature within the coming months.

Featured

What is AMD’s AI Strategy for India?

What is AMD’s Strategy for India?

Last month, to expand its research and engineering operations in India, AMD inaugurated its largest global design centre in Bengaluru. The AMD Technostar R&D campus is a key component of the company’s $400 million investment in India over the next five years. AMD recognises the potential India has for its global goals.

To understand AMD’s strategy for India, AIM caught up with Andrew Dieckmann, CVP & GM, Data Center GPU, AMD; Brad McCredie, CVP at AMD; and Vamsi Boppana, SVP of AI at AMD, at the AMD Advancing AI event in Santa Clara to ask more about the expansion plans in India. “We are definitely focusing on that,” said Dieckmann. “We have a lot of our software team, AI team, and chip design teams in India and we definitely will continue building on that.”

He added that he sees it as a very competitive market as it is a very large market. He expects his competitors to react. “But we have a very strong roadmap, and we intend to lead in the market,” he added, talking about the launch of Instinct MI300X, and how it is going to compete with NVIDIA’s upcoming GH200, while AMD’s GPU is compared with H100s.

Dieckmann and McCredie highlighted how GH200 is going to involve a multi-chip module (MCM) design, which is something that AMD has been doing all this while, using GPUs and CPUs at the same time.

“We’re coming to market with a known tried and tested technology,” McCredie said that GPUs have always been used for generative AI. AMD is coming with a key differentiator in the market with world class CPUs.

AMD to be a household name

In India, giants like Tata, Reliance, and Infosys have been partnering with NVIDIA to acquire its GPUs, and build generative AI in India. AMD also has similar plans which are yet to be disclosed.

“In addition to making the software easy to use and making compiling models easily is giving access to the software and hardware,” said Boppana, talking about how it aims to take its momentum in India. “If I was a developer in India and I asked how to use ROCm in India, it is not that easy to get one as there were not ready cloud instances and the consumer cards were not supported for ROCm,” he explained. “Which has changed now.”

AMD has recently added support for ROCm for its consumer based GPUs such as Radeon and Ryzen. “This has increased access to our platforms and next year we are planning to add cloud instances for more access for programming on our Radeon GPUs which are easily available.” These developments will allow developers to come onto AMD’s platforms and build up in the country.

“With software development, and AI coming on PCs, I think AMD is right at the cusp of this innovation,” Boppana added, talking about developers in India.

The easiest alternatives

“I think one of the reasons why AMD is going to lead from now is because of its partnerships,” said McCredie, highlighting partnerships with Microsoft, Meta, Oracle, and customers such as Databricks, Lamini, and Essential AI.

“The other thing is that the market definitely needs choice, and I think we have established ourselves as the most logical and easy to adopt alternative choice in the market,” he added about competing with NVIDIA. “The last point I’d point out is that we are coming to the market with something that offers better performance than the current state of the art technology from the incumbent”

AMD says that it is not comparing itself against last generation accelerators as the work their MI300X and other products perform are already better than the ones in the market.

“Competition is good. It brings innovation,” said Mark Papermaster, CTO of AMD in an exclusive interaction with AIM. “It brings pricing that ensures value for the customers and spurs the industry forward. We have not only brought competition, but are also bringing in a leadership product in inference applications,” he further added about MI300X.

Focusing on the recent acquisition of Nod.ai, a software stack company that is now helping AMD, Papermaster said that AMD is always looking at the startup community in India. “We have a strong design presence in India. The country will, of course, be central for our AI product development, hardware and software product development efforts.”

Similarly, Gilles Garcia, senior director business lead, data centre communication group at AMD also told AIM in an interaction, “We are always monitoring what value we can bring and what value other companies can bring to us,” he said about investing in AI startups in India.

“We have strong relationships with universities and we are also providing additional training to students. Then we bring them onto a very established internship programme,” he explained about how AMD has established an excellent pipeline for college graduate engineering in India. AMD’s AI strategy in India is from the ground up and from the top down as well.

The post What is AMD’s AI Strategy for India? appeared first on Analytics India Magazine.

KissanAI Unveils Dhenu 1.0 LLM for India’s Agricultural Challenges

KissanAI today announced the launch of Dhenu 1.0, a groundbreaking Agriculture Large Language Model. Tailored specifically for Indian agricultural practices, this bilingual model comprehends English, Hindi, and Hinglish queries, a notable feature catering directly to farmers’ linguistic needs.

Founder Pratik Desai disclosed that Dhenu 1.0 was meticulously trained on extensive, high-quality datasets intricately focused on Indian agricultural practices. The model’s uniqueness lies in its bilingual nature, adeptly processing 300,000 instruction sets in both English and Hindi. This innovation enables Dhenu 1.0 to comprehensively support English, Hindi, and Hinglish queries, catering directly to farmers’ linguistic needs.

KissanAI also partnered with Sarvam AI and NimbleBox.ai for this project. The collaboration with the Sarvam AI team, renowned for its pioneering work in Indic language AI, brought bilingual capabilities to the fore. NimbleBox.ai’s exceptional AI API platform played a pivotal role in expediting the data curation process, a crucial component of the development of Dhenu 1.0.

Desai also expressed gratitude to Microsoft for Startups’ continuous support and TTeknium1’s Open Hermes 2.5, acknowledging its cost-efficient handling of substantial data volumes.

Initial human evaluations have shown promising results, but the team plans rigorous testing before deployment due to its high-impact nature. Desai emphasised that this unveiling marks only the beginning of KissanAI’s efforts to revolutionise agriculture, hinting at future collaborations and innovations.

KissanAI, founded by Desai, is the brainchild of the son of an Indian farmer based in the US who wanted to do something for Indian farmers back home. “We’ve been working in this space for some time, we have enough data in agriculture, and we are working with farmers very closely,” he said.

Desai has a PhD from Wright State University and has been building AI/ML applications for agriculture for quite some time. Previously, he developed an automated labelling mechanism using generative AI. “Using Stable Diffusion within the first month of its release, we created nearly 20,000 stock images,” Desai said.

Desai is constantly partnering with agricultural universities in the country to keep the data up-to-date. The platform has a voice interface that supports nine Indic languages, including Gujarati, Marathi, Tamil, Telugu, Kannada, Malayalam, Punjabi, Bangla, and Hindi. Pretty soon, two more languages – Assamese and Odia – will be added to the list.

The post KissanAI Unveils Dhenu 1.0 LLM for India’s Agricultural Challenges appeared first on Analytics India Magazine.

Strategies for Optimizing Performance and Costs When Using Large Language Models in the Cloud

Strategies for Optimizing Performance and Costs When Using Large Language Models in the Cloud
Image by pch.vector on Freepik

Large Language Model (LLM) has recently started to find their foot in the business, and it will expand even further. As the company began understanding the benefits of implementing the LLM, the data team would adjust the model to the business requirements.

The optimal path for the business is to utilize a cloud platform to scale any LLM requirements that the business needs. However, many hurdles could hinder LLM performance in the cloud and increase the usage cost. It is certainly what we want to avoid in the business.

That’s why this article will try to outline a strategy you could use to optimize the performance of LLM in the cloud while taking care of the cost. What’s the strategy? Let’s get into it.

1. Having a Clear Budget Plan

We must understand our financial condition before implementing any strategy to optimize performance and costs. How much budget we are willing to invest in the LLM will become our limit. A higher budget could lead to more significant performance results but might not be optimal if it doesn’t support the business.

The budget plan needs extensive discussion with various stakeholders so it would not become a waste. Identify the critical focus your business wants to solve and assess if LLM is worth investing in.

The strategy also applies to any solo business or individual. Having a budget for the LLM that you are willing to spend would help your financial problem in the long run.

2. Decide the Right Model Size and Hardware

With the advancement of research, there are many kinds of LLMs that we can choose to solve our problem. With a smaller parameter model, it would be faster to optimize but might not have the best ability to solve your business problems. While a bigger model has a more excellent knowledge base and creativity, it costs more to compute.

There are trade-offs between the performance and cost with the change in the LLM size, which we need to take into account when we decide on the model. Do we need to have bigger parameter models that have better performance but require higher cost, or vice versa? It’s a question we need to ask. So, try to assess your needs.

Additionally, the cloud Hardware could affect the performance as well. Better GPU memory might have a faster response time, allow for more complex models, and reduce latency. However, higher memory means higher cost.

3. Choose the Suitable Inference Options

Depending on the cloud platform, there would be many choices for the inferences. Comparing your application workload requirements, the option you want to choose might be different as well. However, inference could also affect the cost usage as the number of resources is different for each option.

If we take an example from Amazon SageMaker Inferences Options, your inference options are:

  1. Real-Time Inference. The inference processes the response instantly when input comes. It’s usually the inferences used in real-time, such as chatbot, translator, etc. Because it always requires low latency, the application would need high computing resources even in the low-demand period. This would mean that LLM with Real-Time inference could lead to higher costs without any benefit if the demand isn’t there.
  1. Serverless Inference. This inference is where the cloud platform scales and allocates the resources dynamically as required. The performance might suffer as there would be slight latency for each time the resources are initiated for each request. But, it’s the most cost-effective as we only pay for what we use.
  1. Batch Transform. The inference is where we process the request in batches. This means that the inference is only suitable for offline processes as we don’t process the request immediately. It might not be suitable for any application that requires an instant process as the delay would always be there, but it doesn’t cost much.
  1. Asynchronous Inference. This inference is suitable for background tasks because it runs the inference task in the background while the results are retrieved later. Performance-wise, it’s suitable for models that require a long processing time as it can handle various tasks concurrently in the background. Cost-wise, it could be effective as well because of the better resource allocation.

Try to assess what your application needs, so you have the most effective inference option.

4. Construct an Effective Prompts

LLM is a model with a particular case, as the number of tokens affects the cost we would need to pay. That’s why we need to build a prompt effectively that uses the minimum token either for the input or the output while still maintaining the output quality.

Try to build a prompt that specifies a certain amount of paragraph output or use a concluding paragraph such as “summarize,” “concise,” and any others. Also, precisely construct the input prompt to generate the output you need. Don’t let the LLM model generate more than you need.

5. Caching Responses

There would be information that would be repeatedly asked and have the same responses every time. To reduce the number of queries, we can cache all the typical information in the database and call them when it’s required.

Typically, the data is stored in a vector database such as Pinecone or Weaviate, but cloud platform should have their vector database as well. The response that we want to cache would converted into vector forms and stored for future queries.

There are a few challenges when we want to cache the responses effectively, as we need to manage policies where the cache response is inadequate to answer the input query. Also, some caches are similar to each other, which could result in a wrong response. Manage the response well and have an adequate database that could help reduce costs.

Conclusion

LLM that we deploy might end up costing us too much and have inaccurate performance if we don’t treat them right. That’s why here are some strategies you could employ to optimize the performance and cost of your LLM in the cloud:

  1. Have a clear budget plan,
  2. Decide the right model size and hardware,
  3. Choose the suitable inference options,
  4. Construct effective prompts,
  5. Caching responses.

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and Data tips via social media and writing media.

More On This Topic

  • How to Make Large Language Models Play Nice with Your Software…
  • Optimizing Python Code Performance: A Deep Dive into Python Profilers
  • What are Large Language Models and How Do They Work?
  • Feature Store Summit 2023: Practical Strategies for Deploying ML…
  • Top Open Source Large Language Models
  • Learn About Large Language Models

Australian SMEs at Risk of Being Left Behind on AI

2023 has been the year of generative AI, and it’s just getting started. It’s likely we haven’t even predicted the full impact that tools like ChatGPT and Google’s Bard will have. As a report by McKinsey made clear earlier this year: “The expected business disruption from gen AI is significant, and respondents predict meaningful changes to their workforces. They anticipate workforce cuts in certain areas and large reskilling efforts to address shifting talent needs.”

Yet despite this, and at risk of sticking to old ways of doing business that will ultimately put them at risk, Australian small and midsize businesses are struggling to embrace the opportunity of AI. New research by MYOB shows that only 19% of Australian SMEs are currently utilising AI in their operations. Only another 21% of SMEs have plans to adopt AI in the near future, which means that 60% are either unsure how to get involved in AI or simply unaware of it as a business opportunity.

Jump to:

  • What SMEs that overlook AI miss out on
  • Why some SMEs are struggling
  • How SMEs can get involved with generative AI
  • AI is not an area SMEs can afford to fall behind on

What SMEs that overlook AI miss out on

A report earlier this year by banking fintech, Zeller, found that 85% of small business owners are seeking cost-cutting measures and believe that there is “an urgent need for creative solutions to help support growth.” Meanwhile, GetApp research found that 53% of SMEs were increasing their social media budget, with 50% of that increase going into content creation (Figure A).

Graph showing areas where Australian SMEs are increasing their social media budgets.
Figure A: Areas where Australian SMEs are increasing their social media budgets. Image: GetApp

These are all areas where AI can significantly assist SMEs, yet because so few are working with AI, many are missing out on the opportunity and consequently leaving themselves with more manual and labour-intensive work to do.

According to the research by MYOB, those SMEs that are leveraging AI are using it for the following tasks:

  • Social media and social marketing content (49%).
  • Copywriting for marketing materials and press releases (34%).
  • Copywriting for technical documents (25%).
  • Market, trend and risk analysis (25%).

In short, generative tools, which can create text and art assets quickly, are an opportunity for SMEs to cut costs and time spent in generating content — particularly creative assets.

SEE: Learn how to write effective prompts for AI art generators.

Those that do embrace AI to reduce time and expenditure on these things, the MYOB research says, are able to redirect the time to growing the business (44%), innovating (24%) and developing new products and services (24%).

Essentially, those businesses will be in a better position to prepare their businesses to move with agility and tackle market dynamics.

Why some SMEs are struggling

As noted by an OECD report on the challenges SMEs are currently facing: “many small firms continue to lack the skills needed to fully leverage on the potential of digital technologies.” The same report found that SMEs have accelerated their uptake of digital tools and participation, including with social media and cloud services, but seem unaware of how they can jump into AI.

For those IT pros who are working in small business, or are managed services providers for an SME, this situation is an opportunity to support the business in making a critical leap that could mean the difference between an ultimately successful and growth-oriented SME and one that struggles through the disruption.

How SMEs can get involved with generative AI

Taking advantage of generative AI is neither technically complex nor resource-intensive. However, it does require two things that are in short supply for most SMEs: the right skills and training within the organisation to execute on an AI strategy, and an understanding of where AI will deliver the best results.

To assist with this, IT pros can drive a simple, three-step strategy towards AI.

Better understand where the potential for generative AI is

An internal IT pro or MSP can play a crucial role in helping SMEs understand the potential of generative AI and how it can be applied to their specific business context. This would involve a thorough assessment of the SME’s operations, identifying areas where AI can add the most value the quickest and developing a strategic roadmap for AI adoption.

Build the systems to support the AI

IT pros or MSPs can provide the technical expertise needed to implement and manage AI systems. This includes selecting the right AI technologies, setting up the necessary hardware and software infrastructure, ensuring data security and privacy and troubleshooting any technical issues that may arise.

SEE: The debate around AI ethics in Australia is falling behind.

They can also help SMEs navigate the complex landscape of AI vendors and products and choose solutions that best meet their needs and budget. Most SMEs will have limited or non-existent budgets for AI systems, so in many cases, the goal will be to identify where the free AI tools available will add the most value.

Ensure the systems and data points are kept updated

An IT pro or MSP can offer ongoing support and maintenance for AI systems. This is particularly important as AI can make mistakes and needs to be monitored to ensure that the results are delivering value to the SME. This means ensuring that the analytics systems are robust and that there are insights generated within them to help the SME owner make informed decisions about the ongoing use of AI.

AI is not an area SMEs can afford to fall behind on

SMEs are typically laggards with IT innovation, assuming it will either be too expensive or technically complex to implement until after it has filtered through the enterprise and mid-market. AI is different, however.

The typical SME might not be able to have a team of data scientists creating complex AI models, but they can start leveraging tools to help them run a more efficient and productive business. For any IT pro working in this space, 2024 will be dominated by looking for ways to articulate this value to the business owners and then getting them to bridge the gap and embrace the opportunity.

Mistral AI Challenges Dominance of OpenAI, Google & Meta

Last week, the focus was on Gemini. However, this week, everyone is talking about Mistral AI, a Paris-based AI startup that raised over $113 million in June, even without a tangible product. The buzz around Gemini couldn’t sustain for even a week when Mistral AI captured the spotlight with the release of its latest model, Mixtral 8x7B. This model is a combination of the Sparse Mixture of Experts (SMoE) with open weights, and it has been shared through a magnet link on X.

well that was quick pic.twitter.com/IokTKJgGRa

— near (@nearcyan) December 12, 2023

Cannot Ignore Mistral AI

Mistral AI’s latest model, 8X7B, based on the MoE architecture, is comparable to other popular models such as GPT 3.5 and Llama 2 70B. Licensed under Apache 2.0, Mixtral surpasses Llama 2 70B on most benchmarks with 6x faster inference.

Mistral AI brands itself as the ‘Mixtral of Experts.’ That’s clever marketing right, considering that OpenAI has been doing the same thing for training GPT-4 since last year. However, somehow with Mistral AI’s latest model, it suddenly has gained popularity.

Mixture of Experts enable models to be pre-trained with far less compute, which means you can dramatically scale up the model or dataset size with the same compute budget as a dense model.

It is a decoder-only model where the feedforward block picks from a set of 8 distinct groups of parameters. At every layer, for every token, a router network chooses two of these groups (the “experts”) to process the token and combine their output additively.

This method enhances the model’s parameter count while managing computational expenses and processing time. Specifically, Mixtral boasts a total of 46.7 billion parameters; however, it effectively utilises only 12.9 billion parameters for each token. As a result, it processes input and produces output with comparable speed and cost efficiency to that of a 12.9 billion-parameter model.

However, OpenAI Scientist Andrej Karpathy said “8x7B” name is a bit misleading because it is not all 7B params that are being 8x’d, only the FeedForward blocks in the Transformer are 8x’d, everything else stays the same. Hence also why the total number of params is not 56B but only 46.7B.”

Mistral AI Masters Business

The Paris-based startup is on a roll, and also announced to secure $415 million in funding with a valuation of $2 billion. Andreessen Horowitz (a16z) spearheaded the latest funding round, accompanied by a renewed investment from Lightspeed Venture Partners.

Open-Source LLM firms often find it difficult to sustain their business. To overcome this, Mistral AI recently introduced ‘La Plateforme’ where it will provide API endpoints for its available models.

I was worried how @MistralAI is going to money!
If the 7B model is what they are calling as Mistral tiny, Imagine how "Mistral Medium" would be 🤯🤯🤯 pic.twitter.com/vbUks6WRDN

— 1LittleCoder💻 (@1littlecoder) December 11, 2023

The company has created three categories for its models- Mistral Tiny, Mistral Small and Mistral Medium. Mistral 7B Instruct v0.2 and Mixtral 8x7B, comes under Mistral Tiny and Mistral Small respectively. Interestingly, the Medium model is yet to be released.

Mistral AI has stated that it is currently developing Mistral Medium, positioned among the top-serviced models based on standard benchmarks. Proficient in English, French, Italian, German, Spanish, and code, it achieves a score of 8.6 on MT-Bench.On paper, it even beats GPT 3.5.

Interestingly, Mistral opted to launch a paid end-point and refrained from open-sourcing their medium model, which exhibits superior metrics. Introducing hosted API endpoints serves as the most effective method to swiftly gather customer feedback, iterate on real-world use cases, and, crucially, monetize open-source models.

Open-source LLM creators often face challenges in sustaining their businesses. For instance, Stability AI is currently struggling to generate sufficient revenue for survival. As a response, the company has introduced Stability AI Memberships, charging developers a fee to use its LLMs for commercial purposes.

Meta has always been a torchbearer for the open-source community, consistently publishing research papers and releasing models. However, one thing Meta doesn’t necessarily need to prioritise is generating revenue, as it already earns significantly from advertising through its family of social media apps.

The startups which are foraying into creating open-source models just cannot keep creating them without monetising it. As Mistral AI has raised a substantial amount, the investors might be hoping for a return on their investment.

Mistral AI the next OpenAI?

Europe recently reached a preliminary agreement on important rules for using AI in the European Union. Surprisingly, Mistral AI wasn’t in favour of endorsing the EU AI Act. The company might have felt that it will hinder its progress in the near future, potentially requiring the disclosure of trade secrets. As a result, they, along with other open-source companies, were exempted from it.

EU regulations exempt open-source models but require proprietary models to report evaluations to govt agencies.
Way to go!! 👏👏👏
Great news for Mistral!
If US regulation becomes cumbersome, open-source AI start-ups can always move HQ to Europe! 😂😂

— Bindu Reddy (@bindureddy) December 9, 2023

Mistral AI may not continue to release its upcoming models as open source, which is a speculation. This is considering that OpenAI, too, started out as an open-source company. Interestingly, a few months back, OpenAI lobbied the EU to weaken the much-talked-about European Union (EU) AI Act to reduce the regulatory burden on the company.

Karpathy pointed out that the same thing and said that “Glad they refer to it as “open weights” release instead of “open source”, which would imo (in my opinion), require the training code, dataset and docs”

Currently, there are not many AI startups from Europe which have seriously challenged OpenAI and Google. Though Mistral AI is making generative AI fun with top notch marketing and good products, it has announced that it is here to stay.

The post Mistral AI Challenges Dominance of OpenAI, Google & Meta appeared first on Analytics India Magazine.

Open-Source LLM360 Unveiled by Cerebras Systems, Petuum and MBZUAI

AI supercomputer company Cerebras Systems, AI company Petuum, and Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) launched LLM360, a framework for creating open-source large language models (LLMs). Developed in partnership with MBZUAI’s Institute of Foundation Models, LLM360 empowers developers by providing detailed insights and methodologies, promising to simplify, expedite, and reduce costs in the development of LLMs.

Two open-source large language models are released : Amber, a 7 billion parameter English-language model trained on 1.2 trillion tokens, and CrystalCoder, a 7 billion parameter model, trained on 1.4T tokens designed for English language and coding tasks.Both the models are released under the Apache 2.0 license. There is also another model Diamond with 65 billion parameters which is set to release soon. These models are trained on the Condor Galaxy 1 supercomputer, built by G42 and Cerebras systems.

Both the models are built on Meta’s LLaMA architecture and Amber is said to perform similarly to LLaMA-7B, OpenLLaMA-v2-7B and outperforms Pythia-6.7B.

Source: LLM360 Blog

CrystalCoder undergoes meticulous training, incorporating a thoughtful blend of text and code data to enhance its effectiveness in both domains. Notably, the introduction of code data occurs early in the pretraining stage, distinguishing it from Code Llama 2, which relies solely on code data during fine-tuning on Llama 2. Furthermore, CrystalCoder is specifically trained on Python and web programming language, strategically designed to elevate its capabilities as a programming assistant.

UAE Heading Towards AI Dominance

With the recent AI developments, UAE is working towards becoming an AI superpower. Following TII’s Falcon and demographic-specific Jais large language model, UAE has been also rallying for open-source models to promote research initiatives. With the recent AI company, A171 that was launched a few weeks ago, UAE looks to even take on AI giant OpenAI.

The post Open-Source LLM360 Unveiled by Cerebras Systems, Petuum and MBZUAI appeared first on Analytics India Magazine.