AI — Страница 881

The Only Course You Need to Smash Your Data Analyst Career

Image by Author

More and more people are asking about how to become a data analyst. When you’re looking to start a new career, it can be very daunting on which path to go down — or even worse: choosing the right course. Spending hours a day searching for the best courses or watching 1 hour YouTube review videos can be time-consuming.

This article will help you choose the right course, without the hassle of looking around yourself.

Data Analyst Certification

Link: Data Analyst Certification

Let’s start off by mentioning that this course has been rated #1 data analytics certification by Forbes!

You may be coming from a background where your current job consists of you working with data and you’re probably wondering about taking it to the next level and earning some serious money for it. Or you may be completely out of the data and tech world and want to completely transition. This is what I did!

If either of them is the case and you’re seriously looking into becoming a data analyst, this certification offered by DataCamp can get you certified in 30 days.

What’s the Process?

DataCamps Data Analyst certifications are beginner and intermediate-friendly. So don’t worry if you feel like you may not be ready. Multiple levels are offered to ensure that everybody can succeed and reach their career goals.

For those who are new to the data world, I would recommend starting with The Data Analyst Associate Certification which is your gateway into the world of data, building the foundational knowledge required to be successful as a Data Analyst. Find out more information here.

If you already have your foot in the world of data and are ready to take on becoming a Data Analyst — go ahead with the Data Analyst certification. You will have 30 days to complete it — this includes the timed exam(s) and practical exams. You have two timed exams exactly but don’t worry, the course will take you through everything you need to know to ensure you are ready.

The Data Analyst Certification can be taken in R or Python and tests more advanced techniques, preparing you with the skills you need for entry-level Data Analyst roles. You will showcase your knowledge and skills on how you approach business critical questions and provide information to stakeholders.

To be able to gain this certification, you should be able to:

Perform standard data extraction, joining and aggregation tasks.
Assess data quality and perform validation tasks.
Perform standard cleaning tasks to prepare data for analysis.
Calculate metrics to effectively report characteristics of data and relationships between features.

Your timed exam will cover data management, exploratory analysis, and statistical experimentation. The practical exam will cover data management, exploratory analysis and communication.

Your 30 days are up and now you’re looking at the job market to find a job. This part is the most daunting of them all. But this is where I wouldn’t stress. DataCamp offers full access to its exclusive certification community, where you can connect with other certified professionals and explore content curated just for you by our community team.

From group forums, exclusive content and events with industry experts. You will get the support you need to land your dream job.

Wrapping Up

We hope this article saved you from hours of researching or watching review videos on which data analyst course you should choose. The team at KDnuggets want to see everybody win and we hope we have helped your journey!

Nisha Arya is a data scientist, freelance technical writer, and an editor and community manager for KDnuggets. She is particularly interested in providing data science career advice or tutorials and theory-based knowledge around data science. Nisha covers a wide range of topics and wishes to explore the different ways artificial intelligence can benefit the longevity of human life. A keen learner, Nisha seeks to broaden her tech knowledge and writing skills, while helping guide others.

Anthropic claims its latest model is best-in-class

Claude 3.5 Sonnet can analyze both text and images as well as generate text, and it's Anthropic's best-performing model yet — at least on paper.

Anthropic launches Claude 3.5 Sonnet and debuts Artifacts for collaboration

Claude 3.5 Sonnet is twice as fast as the previously highest-performing model, Opus, but at a lower cost, Anthropic says.

Anthropic, the main commercial competitor to OpenAI and Google in closed-source generative artificial intelligence, on Thursday unveiled the latest generation of its large language model (LLM) family, Claude 3.5, starting with its Sonnet model. The startup claims Claude 3.5 Sonnet brings substantial performance improvements on a number of benchmark tests.

Also: Anthropic brings Tool Use for Claude out of beta, promising sophisticated assistants

The Claude family of models is divided into three LLM versions, starting with the simplest, Haiku, proceeding to Sonnet in the middle, and topping off with Opus, which is the most powerful. Impressively, Anthropic claims in its blog post that Claude 3.5 Sonnet surpasses Claude 3 Opus in performance on benchmarks, while costing less to deploy.

Claude 3.5 Sonnet "operates at twice the speed of Claude 3 Opus," Anthropic said. "This performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multistep workflows."

The new Sonnet outperforms competitors on a number of benchmark tests, Anthropic said.

The startup claims the AI model demonstrates particular capabilities in writing, editing, and executing program code "with sophisticated reasoning and troubleshooting capabilities."

(An "AI model" is the part of an AI program that contains numerous neural net parameters and activation functions that are the key elements for how an AI program functions.)

Claude 3.5 Sonnet is available for free on Claude's website and in the iOS and Android apps, and via the Pro and Team versions of the subscription products. Those plans have been given higher rate limits for using Claude 3.5 Sonnet.

The startup plans to release Claude 3.5 Haiku and Claude 3.5 Opus "later this year," Anthropic stated in its blog post.

Artifacts appear alongside the prompt thread of Claude, to be accessed by one more more users at different moments.

Alongside the Sonnet 3.5 announcement, Anthropic unveiled Artifacts, which are pieces of a response from the Claude AI model that can be kept alongside the chat thread in a separate window — or, really, a parallel area of the current window.

"These Artifacts appear in a dedicated window alongside their [the user's] conversation," Anthropic stated. "This creates a dynamic workspace where they can see, edit, and build upon Claude's creations in real-time, seamlessly integrating AI-generated content into their projects and workflows."

The startup claims that the introduction of Artifacts makes "Claude's evolution from a conversational AI to a collaborative work environment." Anthropic offered an example: designers using Artifacts to collaborate on samples of user interface designs.

Also: Anthropic launches a free Claude iOS app and Team, its first enterprise plan

"Design and UX teams can use Artifacts to collaboratively create, iterate, and refine user interface and user experience prototypes, leveraging Claude's understanding of design principles and ability to generate visual assets," the company explained.

Artifacts is currently available in preview through the web-based version of Claude.

Artificial Intelligence

India is One of Cisco’s Top 10 Market Globally

Today, global networking giant Cisco has announced the launch of the Meraki India Region, a new cloud service hosted locally within India to support businesses transitioning to the cloud while addressing local data storage and privacy requirements.

At the launch, Daisy Chittilapilly, president of Cisco India and SAARC said that India is one of the top 10 markets globally for the company and addressed the company’s ambition to elevate it to the top five. The launch of Meraki India Region is a step towards making that happen.

In 2012, Cisco acquired San Francisco based cloud infrastructure startup Meraki for $1.2 billion in cash.

Under the Hood

Cisco Meraki, used by over 810,000 customers globally, is a leading cloud-managed IT platform offers simplified operations, secure networking, and an ecosystem that fosters growth and innovation. Meraki’s offerings include comprehensive networking solutions (wired, wireless, SD-WAN), secure networking, and IoT capabilities, providing customers with centralised visibility and control, unified management of networks, and reduced operational costs.

Chittilapilly was joined by Lawrence Huang, SVP/GM of Cisco Networking of Meraki and Wireless, who said, “Launching the Meraki India Region is crucial for ensuring customers can securely connect and scale their businesses with Meraki’s simplicity.” The region will help customers meet local data storage needs with advanced security features such as penetration testing and daily vulnerability scans, supporting the country’s digital transformation efforts.

The demand for data localisation and privacy in India is growing.

According to Cisco’s 2024 Data Privacy Benchmark Study, 97% of Indian organisations believe data is safer when stored locally, and 94% trust global providers over local ones for data protection. The Meraki India Region offers significant benefits for public sector entities, government, education, financial institutions, healthcare, and professional services that prioritise local data storage as part of their cloud transformation.

Earlier this month, the security solutions provider also introduced AI-powered features for digital resilience by integrating network capabilities with claims of stronger security, observability, and data management, simplifying adoption and providing comprehensive visibility.

It launched one billion dollar global AI investment fund to foster industry innovation and support its AI-driven strategy for connecting and protecting organisations.

Furthermore, Lenovo and Cisco have been in global partnership to deliver integrated infrastructure and networking solutions for accelerating digital transformation for businesses of all sizes with turnkey solutions from edge to cloud since May of this year.

Since May, Lenovo and Cisco have partnered globally to provide integrated infrastructure and networking solutions. accelerate digital transformation for businesses with turnkey solutions from edge to cloud.

Chinese-Built ChatGLM Exceeds GPT-4 Across Several Benchmarks

A recent research paper states, “The latest ChatGLM language model from Tsinghua University and Zhipu AI matches or exceeds the capabilities of GPT-4 across a wide range of benchmarks and tasks.”

The GLM-4 model was pre-trained on 10 trillion tokens of multilingual data and further aligned using techniques like supervised fine-tuning and reinforcement learning from human feedback.

On standard English academic benchmarks spanning knowledge, math, reasoning, and coding, GLM-4 achieves performance comparable to GPT-4 and other state-of-the-art models like Gemini 1.5 Pro and Claude 3 Opus. It scores 83.3% on MMLU (vs 86.4% for GPT-4), 93.3% on GSM8K (vs 92.0%), and 84.7% on the challenging BIG-Bench suite (vs 83.1%).

For instruction following abilities in both English and Chinese, GLM-4 matches the level of GPT-4 Turbo, according to the IFEval benchmark. On Chinese language alignment across domains like math, logic, and professional knowledge, GLM-4 outperforms GPT-4 and other models on the AlignBench evaluation.

The GLM-4 All Tools version can autonomously employ external tools like web browsers, Python interpreters, and text-to-image models to complete complex multi-step tasks. It matches, and in some cases, even surpasses GPT-4 All Tools on capabilities like information gathering and math problem solving.

Tsinghua University has open-sourced several GLM models, with over 10 million downloads in 2023. The team plans to continue improving the model’s capabilities while promoting open access to cutting-edge language AI technologies.

China Introducing Models

China has a history of introducing advanced AI models that manage to surpass their Western counterparts. A few months ago, they released ChemLLM, developed jointly by researchers from Hong Kong Polytechnic University, the Chinese University of Hong Kong, Shanghai Artificial Intelligence Laboratory, Fudan University, Shanghai Jiao Tong University, and Wuhan University. This model is designed to tackle a wide array of chemical tasks through fluent dialogue interaction.

Similarly, by consistently releasing and experimenting with new models, China is advancing rapidly in the field of AI.

Deploying Machine Learning Models: A Step-by-Step Tutorial

Image by author

Model deployment is the process of trained models being integrated into practical applications. This includes defining the necessary environment, specifying how input data is introduced into the model and the output produced, and the capacity to analyze new data and provide relevant predictions or categorizations. Let us explore the process of deploying models in production.

Step 1: Data Preprocessing

Deal with missing values by imputing them using mean values or deleting the rows/columns. Ensure that categorical variables are also transformed from qualitative data to quantitative data by One-Hot Encoding or by Label Encoding. Normalize and standardize numerical features to transform them to a common scale.

import pandas as pd  from sklearn.impute import SimpleImputer  from sklearn.preprocessing import OneHotEncoder, StandardScaler, MinMaxScaler    # Load your data  df = pd.read_csv('your_data.csv')    # Handle missing values  imputer_mean = SimpleImputer(strategy='mean')  df['numeric_column'] = imputer_mean.fit_transform(df[['numeric_column']])    # Encode categorical variables  one_hot_encoder = OneHotEncoder()  encoded_features = one_hot_encoder.fit_transform(df[['categorical_column']]).toarray()  encoded_df = pd.DataFrame(encoded_features, columns=one_hot_encoder.get_feature_names_out(['categorical_column']))    # Normalize and standardize numerical features  # Standardization (zero mean, unit variance)  scaler = StandardScaler()  df['standardized_column'] = scaler.fit_transform(df[['numeric_column']])    # Normalization (scaling to a range of [0, 1])  normalizer = MinMaxScaler()  df['normalized_column'] = normalizer.fit_transform(df[['numeric_column']])

Step 2: Model Training and Evaluation

Divide data into two groups: training data set and testing data set to train the model. Choose a model and train it to the used data. Fine-tuning hyperparameters selects the best-performing machine learning models. The model is checked for its stability with different sub-groups of the data for implementing cross-validation.

import pandas as pd  from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score  from sklearn.ensemble import RandomForestClassifier  from sklearn.metrics import accuracy_score, precision_score, recall_score  from sklearn.impute import SimpleImputer  from sklearn.preprocessing import OneHotEncoder, StandardScaler, MinMaxScaler    # Load your data  df = pd.read_csv('data.csv')    # Split data into training and testing sets  X = df.drop(columns=['target_column'])  y = df['target_column']    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)    # Hyperparameter tuning  param_grid = {      'n_estimators': [50, 100, 200],      'max_depth': [None, 10, 20, 30],      'min_samples_split': [2, 5, 10]  }    grid_search = GridSearchCV(estimator=RandomForestClassifier(random_state=42),                             param_grid=param_grid,                             cv=5,                             scoring='accuracy',                             n_jobs=-1)    # Fit the grid search to the data  grid_search.fit(X_train, y_train)    # Get the best model from the grid search  best_model = grid_search.best_estimator_    # Cross-validation to assess model generalization and robustness  cv_scores = cross_val_score(best_model, X_train, y_train, cv=5, scoring='accuracy')    print(f"Cross-validation scores: {cv_scores}")  print(f"Mean cross-validation score: {cv_scores.mean()}")

Step 3: Model Packaging

Source: https://knowledge.dataiku.com/latest/mlops-o16n/architecture/concept-model-packaging.html

Serialize the code into a more suitable format that can be stored or distributed to the other system. Pickle is one of the conventional formats followed by joblib and ONNX formats based on the user’s requirements. After you have defined and optimized your model, store it in a file or database. Platforms such as Git also come in handy to handle the alterations and modifications to be made. Apply specific measures like encryption of data both while stored and in transit so that the data is not easily accessible to anyone else.

import joblib    joblib.dump(model, 'model.pkl')

Put your serialized model into a container such as Docker. This makes it portable and easier to transport machine learning models to different environments.

# Docker code  FROM python:3.8-slim  COPY model.pkl /app/model.pkl  COPY app.py /app/app.py  WORKDIR /app  RUN pip install -r requirements.txt  CMD ["python", "app.py"]

Step 4: Environment Setup for Deployment

To set infrastructure and resources for model deployment, it is recommended to use cloud services like AWS, Azure, or Google Cloud. Modify the necessary components needed for hosting of the model such as servers, databases and all that can be done on the right cloud infrastructure services of the selected cloud platform.
AWS: Setup EC2 instance using AWS CLI

aws ec2 run-instances       --image-id ami-0abcdef1234567890       --count 1       --instance-type t2.micro       --key-name MyKeyPair       --security-group-ids sg-0abcdef1234567890       --subnet-id subnet-0abcdef1234567890

Azure: Setup Virtual Machine using Azure CLI

az vm create     --resource-group myResourceGroup     --name myVM     --image UbuntuLTS     --admin-username azureuser     --generate-ssh-keys

Google Cloud: Setup Compute Engine instance using Google Cloud CLI

gcloud compute instances create my-instance     --zone=us-central1-a     --machine-type=e2-medium     --subnet=default     --network-tier=PREMIUM     --maintenance-policy=MIGRATE     --image=debian-9-stretch-v20200902     --image-project=debian-cloud     --boot-disk-size=10GB     --boot-disk-type=pd-standard     --boot-disk-device-name=my-instance

Step 5: Building the Deployment Pipeline

Use such as Jenkins, or GitLab CI/CD to automate the step of deploying the model. Design a list of steps to be executed in order to make the deploymnt process more efficient and use a Jenkinsfile or YAML configuration in the context of GitHub Actions.

# Using Jenkins for CI/CD pipeline  pipeline {    agent any    stages {      stage('Build') {        steps {          sh 'python setup.py build'        }      }      stage('Test') {        steps {          sh 'python -m unittest discover'        }      }      stage('Deploy') {        steps {          sh 'docker build -t mymodel:latest .'          sh 'docker run -d -p 5000:5000 mymodel:latest'        }      }    }  }

Step 6: Model Testing

Carry out tests to see to it that all the functions of the model are appropriately fulfilled. After that, the forecasted amounts are compared with the outcomes this model is supposed to provide. Check the model’s generalization capability to ascertain whether it will perform well on other new data. To compare with the sample data, choose the right evaluation criteria – accuracy, precision, recall.

# Import necessary libraries  from sklearn.metrics import accuracy_score, precision_score, recall_score    # Load your test data   test_df = pd.read_csv('your_test_data.csv')      X_test = test_df.drop(columns=['target_column'])  y_test = test_df['target_column']    # Predict outcomes on the test set  y_pred_test = best_model.predict(X_test)    # Evaluate performance metrics  test_accuracy = accuracy_score(y_test, y_pred_test)  test_precision = precision_score(y_test, y_pred_test, average='weighted')  test_recall = recall_score(y_test, y_pred_test, average='weighted')    # Print performance metrics  print(f"Test Set Accuracy: {test_accuracy}")  print(f"Test Set Precision: {test_precision}")  print(f"Test Set Recall: {test_recall}")

Step 7: Monitoring and Maintenance

Make sure that there are no errors in the model with the help of tools such as AWS CloudWatch, Azure Monitor or Google Cloud Monitoring. This will require showing how the model deployed in the future should be modified to make it even better.

AWS CloudWatch

aws cloudwatch put-metric-alarm --alarm-name CPUAlarm --metric-name CPUUtilization   --namespace AWS/EC2 --statistic Average --period 300 --threshold 70   --comparison-operator GreaterThanThreshold --dimensions "Name=InstanceId,Value=i-1234567890abcdef0"   --evaluation-periods 2 --alarm-actions arn:aws:sns:us-east-1:123456789012:my-sns-topic

Source: https://blogs.vmware.com/management/2021/03/cloud-services-aws-cloudwatch-azure-monitor.html

Azure Monitor

az monitor metrics alert create --name 'CPU Alert' --resource-group myResourceGroup   --scopes /subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Compute/virtualMachines/{vm-name}   --condition "avg Percentage CPU > 80" --description 'Alert if CPU usage exceeds 80%'

Source:https://blogs.vmware.com/management/2021/03/cloud-services-aws-cloudwatch-azure-monitor.html

Wrapping Up

The strategies outlined in this tutorial will ensure that you have the key steps that are needed to make machine learning models deploy. Following the aforementioned steps, one can make the trained models usable and easily deployable for practice-based use. From building the model to configuring and validating the structure, you now know how to take your machine learning endeavors from hypothetical to practical.

Jayita Gulati is a machine learning enthusiast and technical writer driven by her passion for building machine learning models. She holds a Master's degree in Computer Science from the University of Liverpool.

In the Next Decade, India will Lead AI Initiatives for World: Databricks’ Anil Bhasin

On the sidelines of the recently concluded Data + AI Summit, Databricks India and SAARC region vice president Anil Bhasin spoke about India leading AI innovation and building products for the world.

“I believe, in the next decade, our country will lead the AI initiatives for the world. I’m a hardcore Indian and believe in the power of India,” Bhasin told AIM in an exclusive interview. He added that Indian customers are innovating fast and Databricks is excited to work with them.

He explained that it’s difficult to find the scale, size, and complexity of India anywhere else, so solving India’s toughest challenges becomes a template for the world.

Databricks’ Deepens Commitment to the Indian Market

Bhasin mentioned that Databricks is ambitious about transforming India into the world’s first data AI-driven economy.

Databricks has many partnerships in India and Bhasin emphasised that the company doesn’t perceive itself as just a technology vendor but as a strategic partner, always striving to think ahead for the customers by sharing all the latest trends, best practices, knowledge, and innovation in the AI industry.

He said that the biggest value differentiator for Databricks is having an extensible platform with all built-in capabilities required from ingestion to serving. Additionally, the platform being open source and having an added layer of governance allows everybody to democratise AI in a big way.

Databricks recently announced a growth of over 80% in its India business over the past two fiscal years, fueled by the rising demand for data and AI capabilities among Indian enterprises from all industries, including FSI, retail, manufacturing, and digital natives.

Indian customers such as Air India, Aditya Birla Fashion and Retail Ltd, CommerceIQ, Freshworks, InMobi, Meesho, Myntra, Parle, UPL, and many others are leveraging the Databricks Data Intelligence Platform to boost business innovation, optimise operations, and enhance decision-making.

Recently, Krutrim, the AI company founded by Ola’s Bhavish Aggarwal, also announced its partnership with Databricks to pre-train and fine-tune its foundational model and develop GenAI models tailored for the Indian market.

Krutrim recently launched Krutrim-7B-chat, an LLM trained in 10 Indian languages, available on Databricks Marketplace. Infosys also partnered with Databricks to leverage the latter’s unified data analytics platform with Infosys’ AI-first offering, Infosys Topaz.

Shorthills AI also recently announced their strategic alliance with Databricks to improve business operations through the integration of AI and data analytics. The company has also invested in Aravind Srinivas’ AI chatbot-powered research and conversational search engine Perplexity AI.

On a Mission to Help Companies Embrace AI

At the summit, Databricks also announced a host of new GenAI capabilities and a major push to its open-source strategy.

The new offerings, such as Mosaic AI Model Training, Mosaic AI for RAG, and Mosaic AI Gateway, in addition to open-sourcing their Unity Catalog, aim to help enterprises build high-quality, domain-specific AI applications.

A pioneer in Lakehouse, and now having acquired Tabular, Databricks seems to be trying to build this pendrive or USB port of sorts that can be plugged into AI systems in the future — achieving 100% interoperability.

The idea is to let users own their data, removing lock-in, reducing the cost, and also letting users get many more use cases by giving them the choice to use different engines for different purposes.

NVIDIA chief Jensen Huang also appreciated the open source initiatives by Databricks, talking about how the open source AI movement has made it possible for every company to be an AI company.

Databricks also launched a new Learning Festival, which will train practitioners and provide them with more hands-on training and certification.

Bhasin further said that Databricks is the only company on the planet that provides community training, learning enablement, and driving thought leadership at scale. “The fact that we want to build a CIO community around data in itself is a great value proposition,” he said.

India being an open source AI champion, and from young developers to companies leveraging open source models to build exciting products for the world, the various capabilities and features offered by Databricks would be a good kickstart for the Indian developer community.

Even Jensen Huang agrees!

When asked about how customers and organisations can get started with AI today, the NVIDIA chief said, “I think the Databricks’ Data Intelligence Platform (DIP) is incredible. It has made it easy for people to manage their data and extract information. So, the best way to start is to come to Databricks.”

How Much Does It Cost To Build an AI Research Startup in India?

Everyone hopes to build an AI startup. After leaving OpenAI, Ilya Sutskever has started his own venture called Safe Superintelligence. The startup is bound to raise billions of dollars in pursuing its goal of building ASI, but what about Indian startups that are aiming to do fundamental AI research?
Speaking with AIM, Soket AI Labs founder & CEO Abhishek Upperwal revealed the numbers required to build a research-based AI startup. So far, the company has already built Pragna-1B, which is a foundational model specifically for Indic languages.

Upperwal said that, currently, funding is just enough to make-do for AI research within a startup. “Yes, there are fewer funds that are available as compared to any foreign markets, but I also believe that we can maybe make-do with that particular fund and then ultimately grow in scale after the seed stage,” said Upperwal.

He explained that for the seed stage in India, a funding of $5 million or $10 million is still a decent amount. This roughly translates to around INR 40-50 crores. This is still very little when compared to the 100s of millions raised by companies in the West.

“If VCs can trust these companies in the generative AI space, we can definitely do wonderful stuff for sure,” added Upperwal.

Where Does the Money Go?

The gap in funding is because of the market. India is a smaller market, therefore the ticket sizes are still way smaller compared to the West. But Upperwal said that the ticket sizes being small pose a problem when compared to the work that startups have to do at the foundational layer.

Upperwal gives the example of building an estimated 7 billion parameter foundation model out of India. He said that the cost for the compute alone would be close to $2 million. For reference, one NVIDIA H100 costs around INR 30 lakhs, or $36k.

To build a 7 billion parameter model, considering a six month time frame, a startup would require at least a dozen NVIDIA GPUs for the training period, taking into account time for other factors like inefficiencies.

This is while considering that the model is built in one shot. “It takes a lot of experiments, and checkpoints, or the path that you are taking fails, so you need to rebuild from the previous checkpoint,” explained Upperwal.

Earlier, Upperwal had told AIM that it took the company six months to train the Pragna model, which involved many experiments with different models and a total of 150 billion tokens. It took close to 8000 GPU hours on NVIDIA A100s to train the model.

Accounting for all of these, an ideal amount to do a lot of foundational work in AI at the seed stage is anywhere around USD $7-15 million, which is close to around INR 125 crore.

All of this is including the cost of running the business such as hiring talent and paying bills, and does not include the cost of making the models ready for production or inference. That would increase the funding requirement to, at least, more than double.

In the same conversation, Speciale Invest founder and partner Arjun Rao said that Indian VCs are interested in investing in the development phase, more than the research phase of AI. It would take a lot of time to research, build a model, compare it with others, and then figure out how it can be commoditised, which is something VCs are still figuring out.

Assuming they’re working with a team of eight people and a funding of approximately $10 million (or INR 82 crore), an AI startup in India can survive for about 2.33 years at the current estimated monthly expenditure of INR 2.92 crore, which does not include inference costs.

Assuming that the $2 million as Upperwal mentioned above goes towards compute just for training a 7 billion model for around six months, it still would account for 80.2% of the expenditure for 2 years, with the rest going towards salaries and other expenditure.

This calculation assumes that the monthly costs remain constant and does not account for potential increases in expenses due to scaling, inflation, or other operational changes.

How Much Are Indian Startups Getting?

While these numbers seem reasonable, they are comparatively lower when looking at the global standard set by OpenAI, Anthropic, or Mistral, who have raised billions of dollars.

For reference, Sarvam AI, which has announced its intention of building foundational AI models, has raised a total of $41 million. Pranav Mistry-led and Reliance-backed TWO.AI has raised $20 million for building their SUTRA line of models, and Krutrim raised $50 million becoming India’s first generative AI unicorn.

This is just for initial research. SML CEO and founder Vishnu Vardhan emphasised the huge investment required to build and scale complex AI models. In an exclusive interview with AIM, Vardhan disclosed his plans of raising $200-300 million for the same. The company only recently launched Hanooman, its own foundational model.

“That’s the kind of money we need to launch this kind of a product. We’ve already spent tens of millions of dollars, but that won’t work,” he said, about building a GPT-5 level model in India.

Even if you add the total funds raised by companies in India, it’s still nothing when compared to OpenAI raising billions single-handedly. It definitely would cost a lot more to build an AI startup in India to compete with the West.

Creating AI-Driven Solutions: Understanding Large Language Models

Image by Editor | Midjourney & Canva

Large Language Models are advanced types of artificial intelligence designed to understand and generate human-like text. They are built using machine learning techniques, specifically deep learning. Essentially, LLMs are trained on vast amounts of text data from the Internet, books, articles, and other sources to learn the patterns and structures of human language.

The history of Large Language Models (LLMs) began with early neural network models. Still, a significant milestone was the introduction of the Transformer architecture by Vaswani et al. in 2017, detailed in the paper "Attention Is All You Need."

The Transformer — model architecture | Source: Attention Is All You Need

This architecture improved the efficiency and performance of language models. In 2018, OpenAI released GPT (Generative Pre-trained Transformer), which marked the beginning of highly capable LLMs. The subsequent release of GPT-2 in 2019, with 1.5 billion parameters, demonstrated unprecedented text generation abilities and raised ethical concerns due to its potential misuse. GPT-3, launched in June 2020, with 175 billion parameters, further showcased the power of LLMs, enabling a wide range of applications from creative writing to programming assistance. More recently, OpenAI's GPT-4, released in 2023, continued this trend, offering even greater capabilities, although specific details about its size and data remain proprietary.

Key components of LLMs

LLMs are complex systems with several critical components that enable them to understand and generate human language. The key elements are neural networks, deep learning, and transformers.

Neural Networks

LLMs are built on neural network architectures, computing systems inspired by the human brain. These networks consist of layers of interconnected nodes (neurons). Neural networks process and learn from data by adjusting the connections (weights) between neurons based on the input they receive. This adjustment process is called training.

Deep Learning

Deep learning is a subset of machine learning that uses neural networks with multiple layers, hence the term "deep." It allows LLMs to learn complex patterns and representations in large datasets, making them capable of understanding nuanced language contexts and generating coherent text.

Transformers

The Transformer architecture, introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al., revolutionized natural language processing (NLP). Transformers use an attention mechanism that enables the model to focus on different parts of the input text, understanding context better than previous models. Transformers consist of encoder and decoder layers. The encoder processes the input text, and the decoder generates the output text.

How Do LLMs Work?

LLMs operate by harnessing deep learning techniques and extensive textual datasets. These models typically employ transformer architectures, such as the Generative Pre-trained Transformer (GPT), which excels in handling sequential data like text inputs.

This image illustrates how LLMs are trained and how they generate responses.

Throughout the training process, LLMs can forecast the next word in a sentence by considering the context that precedes it. This involves assigning probability scores to tokenized words, broken into more minor character sequences, and transforming them into embeddings, numerical representations of context. LLMs are trained on massive text corpora to ensure accuracy, enabling them to grasp grammar, semantics, and conceptual relationships through zero-shot and self-supervised learning.

Once trained, LLMs autonomously generate text by predicting the next word based on received input and drawing from their acquired patterns and knowledge. This results in coherent and contextually relevant language generation that is useful for various Natural Language Understanding (NLU) and content generation tasks.

Moreover, enhancing model performance involves tactics like prompt engineering, fine-tuning, and reinforcement learning with human feedback (RLHF) to mitigate biases, hateful speech, and factually incorrect responses termed "hallucinations" that may arise from training on vast unstructured data. This aspect is crucial in ensuring the readiness of enterprise-grade LLMs for safe and effective use, safeguarding organizations from potential liabilities and reputational harm.

LLM use cases

LLMs have various applications across various industries due to their ability to understand and generate human-like language. Here are some everyday use cases, along with a real-world example as a case study:

Text generation: LLMs can generate coherent and contextually relevant text, making them useful for tasks such as content creation, storytelling, and dialogue generation.
Translation: LLMs can accurately translate text from one language to another, enabling seamless communication across language barriers.
Sentiment analysis: LLMs can analyze text to determine the sentiment expressed, helping businesses understand customer feedback, social media reactions, and market trends.
Chatbots and virtual assistants: LLMs can power conversational agents that interact with users in natural language, providing customer support, information retrieval, and personalized recommendations.
Content summarization: LLMs can condense large amounts of text into concise summaries, making it easier to extract critical information from documents, articles, and reports.

Case Study:ChatGPT

OpenAI's GPT-3 (Generative Pre-trained Transformer 3) is one of the most significant and potent LLMs developed. It has 175 billion parameters and can perform various natural language processing tasks. ChatGPT is an example of a chatbot powered by GPT-3. It can hold conversations on multiple topics, from casual chit-chat to more complex discussions.

ChatGPT can provide information on various subjects, offer advice, tell jokes, and even engage in role-playing scenarios. It learns from each interaction, improving its responses over time.

ChatGPT has been integrated into messaging platforms, customer support systems, and productivity tools. It can assist users with tasks, answer frequently asked questions, and provide personalized recommendations.

Using ChatGPT, companies can automate customer support, streamline communication, and enhance user experiences. It provides a scalable solution for handling large volumes of inquiries while maintaining high customer satisfaction.

Developing AI-Driven Solutions with LLMs

Developing AI-driven solutions with LLMs involves several key steps, from identifying the problem to deploying the solution. Let's break down the process into simple terms:

This image illustrates how to develop AI-driven solutions with LLMs | Source: Image by author.

Identify the Problem and Requirements

Clearly articulate the problem you want to solve or the task you wish the LLM to perform. For example, create a chatbot for customer support or a content generation tool. Gather insights from stakeholders and end-users to understand their requirements and preferences. This helps ensure that the AI-driven solution meets their needs effectively.

Design the Solution

Choose an LLM that aligns with the requirements of your project. Consider factors such as model size, computational resources, and task-specific capabilities. Tailor the LLM to your specific use case by fine-tuning its parameters and training it on relevant datasets. This helps optimize the model's performance for your application.

If applicable, integrate the LLM with other software or systems in your organization to ensure seamless operation and data flow.

Implementation and Deployment

Train the LLM using appropriate training data and evaluation metrics to assess its performance. Testing helps identify and address any issues or limitations before deployment. Ensure that the AI-driven solution can scale to handle increasing volumes of data and users while maintaining performance levels. This may involve optimizing algorithms and infrastructure.

Establish mechanisms to monitor the LLM's performance in real time and implement regular maintenance procedures to address any issues.

Monitoring and Maintenance

Continuously monitor the performance of the deployed solution to ensure it meets the defined success metrics. Collect feedback from users and stakeholders to identify areas for improvement and iteratively refine the solution. Regularly update and maintain the LLM to adapt to evolving requirements, technological advancements, and user feedback.

Challenges of LLMs

While LLMs offer tremendous potential for various applications, they also have several challenges and considerations. Some of these include:

Ethical and Societal Impacts:

LLMs may inherit biases present in the training data, leading to unfair or discriminatory outcomes. They can potentially generate sensitive or private information, raising concerns about data privacy and security. If not properly trained or monitored, LLMs can inadvertently propagate misinformation.

Technical Challenges

Understanding how LLMs arrive at their decisions can be challenging, making it difficult to trust and debug these models. Training and deploying LLMs require significant computational resources, limiting accessibility to smaller organizations or individuals. Scaling LLMs to handle larger datasets and more complex tasks can be technically challenging and costly.

Legal and Regulatory Compliance

Generating text using LLMs raises questions about the ownership and copyright of the generated content. LLM applications need to adhere to legal and regulatory frameworks, such as GDPR in Europe, regarding data usage and privacy.

Environmental Impact

Training LLMs is highly energy-intensive, contributing to a significant carbon footprint and raising environmental concerns. Developing more energy-efficient models and training methods is crucial to mitigate the environmental impact of widespread LLM deployment. Addressing sustainability in AI development is essential for balancing technological advancements with ecological responsibility.

Model Robustness

Model robustness refers to the consistency and accuracy of LLMs across diverse inputs and scenarios. Ensuring that LLMs provide reliable and trustworthy outputs, even with slight variations in input, is a significant challenge. Teams are addressing this by incorporating Retrieval-Augmented Generation (RAG), a technique that combines LLMs with external data sources to enhance performance. By integrating their data into the LLM through RAG, organizations can improve the model's relevance and accuracy for specific tasks, leading to more dependable and contextually appropriate responses.

Future of LLMs

LLMs' achievements in recent years have been nothing short of impressive. They have surpassed previous benchmarks in tasks such as text generation, translation, sentiment analysis, and question answering. These models have been integrated into various products and services, enabling advancements in customer support, content creation, and language understanding.

Looking to the future, LLMs hold tremendous potential for further advancement and innovation. Researchers are actively enhancing LLMs' capabilities to address existing limitations and push the boundaries of what is possible. This includes improving model interpretability, mitigating biases, enhancing multilingual support, and enabling more efficient and scalable training methods.

Conclusion

In conclusion, understanding LLMs is pivotal in unlocking the full potential of AI-driven solutions across various domains. From natural language processing tasks to advanced applications like chatbots and content generation, LLMs have demonstrated remarkable capabilities in understanding and generating human-like language.

As we navigate the process of building AI-driven solutions, it is essential to approach the development and deployment of LLMs with a focus on responsible AI practices. This involves adhering to ethical guidelines, ensuring transparency and accountability, and actively engaging with stakeholders to address concerns and promote trust.

Shittu Olumide is a software engineer and technical writer passionate about leveraging cutting-edge technologies to craft compelling narratives, with a keen eye for detail and a knack for simplifying complex concepts. You can also find Shittu on Twitter.

49% of Indian CEOs are Hiring for GenAI Roles that Didn’t Exist Last Year: IBM Study

A new study by the IBM Institute for Business Value found that surveyed Indian CEOs are facing workforce, culture and governance challenges as they act quickly to implement and scale generative AI across their organisations.

Around 49% of Indian CEOs surveyed said they are hiring for Gen AI roles that didn’t exist last year. Whereas 71% of them said that succeeding with AI will depend more on people’s adoption than the technology itself.

Moreover, CEOs surveyed from India said 34% of their workforce will require retraining and reskilling over the next three years – up from just 6% globally in 2021.

The annual global study of 3,000 CEOs from over 30 countries and 26 industries reveals a high importance placed by Indian CEOs on AI governance, with 71% of those surveyed saying trusted AI is impossible without effective AI governance in organisations.

Further substantiating this, 75% of Indian CEO respondents say governance for generative AI must be established as solutions are designed rather than after they are deployed.

At the same time, the study also noted a contrast in actual adoption of AI governance policies with only 42% Indian CEO respondents saying they have good generative AI governance in place today.

This may be because people in the organization aren’t sure of exactly what they’re being asked to do. In the survey, 75% of Indian CEO respondents say that inspiring their team with a common vision produces better outcomes than providing precise standards and targets. Yet 31% acknowledge that their employees don’t fully understand how strategic decisions impact them.

“As Indian CEOs navigate AI-led transformations within their organizations, they recognize the need for AI guardrails so that they derive real business value responsibly for growth and competitive success. However, our study reveals a gap between their intention and actual implementation. This scenario highlights the complexity of implementing AI governance, hence making a strong case for partnering with trusted experts to develop and execute effective practices and policies,” Sandip Patel, managing director of IBM India & South Asia, said.

Other key study findings include:

Indian CEOs recognize it takes a cultural shift to successfully scale AI, but face organizational collaboration and adoption challenges.

70% of Indian CEOs surveyed say their organization’s success is directly tied to the quality of collaboration between finance and technology, yet nearly half (48%) say competition among their C-Suite executives sometimes impedes collaboration.
Nearly half (48%) of those surveyed from India acknowledge that cultural change is more important to becoming a data-driven organization than overcoming technical challenges.
58% of Indian CEO respondents say they are pushing their organization to adopt generative AI more quickly than some people are comfortable with

Customer experience and product & service innovation are top priorities, regulatory constraints might be hindering long-term progress

Indian CEOs surveyed ranked customer experience and product & service innovation as their highest priorities for the next three years.
59% of respondents say they are willing to sacrifice operational efficiency for greater innovation.
However, nearly half (48%) of Indian CEOs surveyed point to regulatory constraints as their top barrier to innovation.
Today, only 32% of the Indian CEO respondents are primarily funding their generative AI investments with net new IT spend, with the remaining 68% reducing other technology spend.

Data Analyst Certification

What’s the Process?

Wrapping Up

More On This Topic

Artificial Intelligence

Under the Hood

Step 1: Data Preprocessing

Step 2: Model Training and Evaluation

Step 3: Model Packaging

Step 4: Environment Setup for Deployment

Step 5: Building the Deployment Pipeline

Step 6: Model Testing

Step 7: Monitoring and Maintenance

Wrapping Up

More On This Topic

Databricks’ Deepens Commitment to the Indian Market

On a Mission to Help Companies Embrace AI

Where Does the Money Go?

How Much Are Indian Startups Getting?

Key components of LLMs

Neural Networks

Deep Learning

Transformers

How Do LLMs Work?

LLM use cases

Case Study:ChatGPT

Developing AI-Driven Solutions with LLMs

Identify the Problem and Requirements

Design the Solution

Implementation and Deployment

Monitoring and Maintenance

Challenges of LLMs

Ethical and Societal Impacts:

Technical Challenges

Legal and Regulatory Compliance

Environmental Impact

Model Robustness

Future of LLMs

Conclusion

More On This Topic