Will Water Crisis Be a Hurdle in India’s Data Centre Dream

Bengaluru, India’s Silicon Valley, is grappling with an acute water shortage, with a daily deficit of 500 million litres. This has left experts scratching their heads over its implications on the water-guzzling data centres that power the city’s digital infrastructure.

The IT hub has over 16 operational data centres, with 205.64 MW capacity, run by 9 major providers, including CtrlS, NTT, STT Telemedia, Sify, and Iron Mountain. These facilities, housing servers and computing infrastructure for cloud services, AI, banking, e-commerce, and more, require enormous amounts of water for cooling purposes.

Data centres consume 26 million litres per MW annually, resulting in a staggering 1.4 crore litres of daily water usage—equivalent to nearly 41,900 households in Bengaluru alone.

However, according to a Niti Aayog report, Bangalore is not the only prime data centre epicentre affected; Mumbai, Bengaluru, Delhi, and Chennai are also facing an increasing crisis.

Since India’s installed data centre capacity is projected to increase four times, from 499 MW in 2021 to 2.01 thousand MW in 2024, and is expected to reach 4.77 thousand MW by 2029, the pre-existing water scarcity has raised concerns over the sustainability of India and Bengaluru’s digital ambitions.

Technological Move Towards Sustainability

While the government has released a draft ‘National Data Centre Policy‘ to bolster India’s installed data centre capacity—these policies overlook a crucial issue: the environmental impact of data centres, particularly their water usage.

Data centres consume water directly for cooling and indirectly through non-renewable electricity generation, exacerbating water scarcity concerns.

Experts warn that without urgently adopting sustainable practices like waterless cooling technologies, greater renewable energy adoption and stricter regulation, the water crisis could force data centres to relocate from Bengaluru, jeopardising the digital ecosystem.

To address this issue, companies like Google and Microsoft have adopted recycled wastewater for cooling, However, many data centres still rely on fresh municipal water supply.

AWS CEO Adam Selipsky has also emphasised the urgency of the situation, stating, “In just a few years, half of the world’s population is projected to live in water-stressed areas, so to ensure all people have access to water, we all need to innovate new ways to help conserve and reuse this precious resource.”

Additionally, some players are rapidly adopting innovative cooling solutions to slash water usage as the sector grapples with rising temperatures and water scarcity challenges. For instance, Bengaluru-based CtrlS aims to use 100% recycled water across all its facilities, while NTT targets 99% waste recycling by 2030 and increasing recycled wastewater usage.

Iron Mountain’s data centres run entirely on renewable energy purchased via green credits. They are opting for methods like liquid immersion cooling, direct chip cooling, and closed-loop dielectric cooling, which are also being developed to save water.

“Water is becoming an increasingly precious resource, especially in regions facing drought conditions,” said Piyush Somani, CEO of ESDS Software Solution, a managed cloud service provider that delivers the best-in-class data centre and managed services.

“Adopting liquid cooling is vital for ensuring the long-term sustainability of our data centres,” Somani added.

Liquid immersion cooling fully submerges servers in a specialised fluid, transferring heat more efficiently than air and using minimal water to cool the fluid. Meanwhile, direct liquid cooling circulates coolant directly onto processor chips, eliminating the need for energy-intensive air-conditioning systems.

Beyond water savings, these technologies also boost energy efficiency by 10-50%, slashing operating costs and carbon emissions for data centre operators.

Moving Towards Waterless Data Centres

Moreover, as traditional air-based cooling methods have reached limitations, companies are now turning to direct-to-chip liquid cooling for its superior heat transfer capabilities.

It has also prompted the adoption of waterless, two-phase solutions like ZutaCore’s HyperCool, which is being used by AMD, Dell, and data centre giants like Equinix.

HyperCool offers significant benefits for high-performance computing, server densification, and data centre sustainability, tailored to modern cloud, AI, and HPC workloads.

Direct-to-chip liquid cooling facilitates sustainability in data centres by reducing space and construction, extending server lifetime, reducing cooling power consumption, maximising heat reuse, and reducing the environmental impact.

With its dielectric liquid cooling solution, ZutaCore addresses the rising challenges of escalating heat densities exceeding 100KW in data centres as heavy GPU racks become part of Indian data centres like Yotta.

This technology significantly improves cooling efficiency, enhances equipment performance, and yields substantial operator cost savings. So much so that Vijay Sampathkumar, country manager at Zutacore, explained, “We use zero water… and this technology can ‘decimate 92% of GPU heat’ and ‘70% of CPU heat’ without water.”

Integrated with AI-powered optimisation algorithms, ZutaCore’s solution reduces energy usage by 30%, ensuring precise cooling resource allocation without compromising performance or reliability.

Through AI-driven simulations and predictive analytics, the company collaborates with industry partners to develop tailored liquid cooling solutions that anticipate and adapt to future requirements.

Additionally, Evolution Data Centres are addressing the issue in developing countries facing water scarcity, like those in Southeast Asia using their air-cooled chillers—replacing traditional water-cooled systems that require around 50,000 litres of water per day per MW.

“These air-cooled chillers require no water during normal operation, which helps to achieve virtually zero Water Usage Effectiveness (WUE),” emphasised Simon Hamer, CTO at Evolution.

Meanwhile, hyperscalers such as Meta are also venturing into the Indian market to tap into the burgeoning demand.

While Meta is collaborating with Reliance to utilise their co-location units, adding to the demand for such infrastructures, it is also consequently straining the available resources.

The post Will Water Crisis Be a Hurdle in India’s Data Centre Dream appeared first on Analytics India Magazine.

Mistral 7B-V0.2: Fine-Tuning Mistral’s New Open-Source LLM with Hugging Face

Mistral 7B-V0.2: Fine-Tuning Mistral’s New Open-Source LLM with Hugging Face
Image by Author

Mistral AI, one of the world’s leading AI research companies, has recently released the base model for Mistral 7B v0.2.

This open-source language model was unveiled during the company’s hackathon event on March 23, 2024.

The Mistral 7B models have 7.3 billion parameters, making them extremely powerful. They outperform Llama 2 13B and Llama 1 34B on almost all benchmarks. The latest V0.2 model introduces a 32k context window among other advancements, enhancing its ability to process and generate text.

Additionally, the version that was recently announced is the base model of the instruction-tuned variant, “Mistral-7B-Instruct-V0.2,” which was released earlier last year.

In this tutorial, I will show you how to access and fine-tune this language model on Hugging Face.

Understanding Hugging Face’s AutoTrain Feature

We will be fine-tuning the Mistral 7B-v0.2 base model using Hugging Face’s AutoTrain functionality.

Hugging Face is renowned for democratizing access to machine learning models, allowing everyday users to develop advanced AI solutions.

AutoTrain, a feature of Hugging Face, automates the process of model training, making it accessible and efficient.

It helps users select the best parameters and training techniques when fine-tuning models, which is a task that can otherwise be daunting and time-consuming.

Fine-Tuning the Mistral-7B Model with AutoTrain

Here are 5 steps to fine-tuning your Mistral-7B model:

1. Setting up the environment

You must first create an account with Hugging Face, and then create a model repository.

To achieve this, simply follow the steps provided in this link and come back to this tutorial.

We will be training the model in Python. When it comes to selecting a notebook environment for training, you can use Kaggle Notebooks or Google Colab, both of which provide free access to GPUs.

If the training process takes too long, you might want to switch to a cloud platform like AWS Sagemaker or Azure ML.

Finally, perform the following pip installs before you start coding along to this tutorial:

!pip install -U autotrain-advanced  !pip install datasets transformers

2. Preparing your dataset

In this tutorial, we will be using the Alpaca dataset on Hugging Face, which looks like this:

Mistral 7B-V0.2: Fine-Tuning Mistral’s New Open-Source LLM with Hugging Face

We will fine-tune the model on pairs of instructions and outputs and assess its ability to respond to the given instruction in the evaluation process.

To access and prepare this dataset, run the following lines of code:

import pandas as pd  from datasets import load_dataset    # Load and preprocess dataset  def preprocess_dataset(dataset_name, split_ratio='train[:10%]', input_col='input', output_col='output'):     dataset = load_dataset(dataset_name, split=split_ratio)     df = pd.DataFrame(dataset)     chat_df = df[df[input_col] == ''].reset_index(drop=True)     return chat_df    # Formatting according to AutoTrain requirements  def format_interaction(row):     formatted_text = f"[Begin] {row['instruction']} [End] {row['output']} [Close]"     return formatted_text    # Process and save the dataset  if __name__ == "__main__":     dataset_name = "tatsu-lab/alpaca"     processed_data = preprocess_dataset(dataset_name)     processed_data['formatted_text'] = processed_data.apply(format_interaction, axis=1)         save_path = 'formatted_data/training_dataset'     os.makedirs(save_path, exist_ok=True)     file_path = os.path.join(save_path, 'formatted_train.csv')     processed_data[['formatted_text']].to_csv(file_path, index=False)     print("Dataset formatted and saved.")

The first function will load the Alpaca dataset using the “datasets” library and clean it to ensure that we aren’t including any empty instructions. The second function structures your data in a format that AutoTrain can understand.

After running the above code, the dataset will be loaded, formatted, and saved in the specified path. When you open your formatted dataset, you should see a single column labeled “formatted_text.”

3. Setting up your training environment

Now that you’ve successfully prepared the dataset, let’s proceed to set up your model training environment.

To do this, you must define the following parameters:

project_name = 'mistralai'  model_name = 'alpindale/Mistral-7B-v0.2-hf'  push_to_hub = True  hf_token = 'your_token_here'  repo_id = 'your_repo_here.'

Here is a breakdown of the above specifications:

  • You can specify any project_name. This is where all your project and training files will be stored.
  • The model_name parameter is the model you’d like to fine-tune. In this case, I’ve specified a path to the Mistral-7B v0.2 base model on Hugging Face.
  • The hf_token variable must be set to your Hugging Face token, which can be obtained by navigating to this link.
  • Your repo_id must be set to the Hugging Face model repository that you created in the first step of this tutorial. For example, my repository ID is NatasshaS/Model2.

4. Configuring model parameters

Before fine-tuning our model, we must define the training parameters, which control aspects of model behavior such as training duration and regularization.

These parameters influence key aspects like how long the model trains, how it learns from the data, and how it avoids overfitting.

You can set the following parameters for your model:

use_fp16 = True  use_peft = True  use_int4 = True  learning_rate = 1e-4  num_epochs = 3  batch_size = 4   block_size = 512   warmup_ratio = 0.05  weight_decay = 0.005  lora_r = 8  lora_alpha = 16  lora_dropout = 0.01

5. Setting environment variables

Let’s now prepare our training environment by setting some environment variables.

This step ensures that the AutoTrain feature uses the desired settings to fine-tune the model, such as our project name and training preferences:

os.environ["PROJECT_NAME"] = project_name  os.environ["MODEL_NAME"] = model_name  os.environ["LEARNING_RATE"] = str(learning_rate)  os.environ["NUM_EPOCHS"] = str(num_epochs)  os.environ["BATCH_SIZE"] = str(batch_size)  os.environ["BLOCK_SIZE"] = str(block_size)  os.environ["WARMUP_RATIO"] = str(warmup_ratio)  os.environ["WEIGHT_DECAY"] = str(weight_decay)  os.environ["USE_FP16"] = str(use_fp16)  os.environ["LORA_R"] = str(lora_r)  os.environ["LORA_ALPHA"] = str(lora_alpha)  os.environ["LORA_DROPOUT"] = str(lora_dropout)

6. Initiate model training

Finally, let’s start training the model using the autotrain command. This step involves specifying your model, dataset, and training configurations, as displayed below:

!autotrain llm    --train    --model "${MODEL_NAME}"    --project-name "${PROJECT_NAME}"    --data-path "formatted_data/training_dataset/"    --text-column "formatted_text"    --lr "${LEARNING_RATE}"    --batch-size "${BATCH_SIZE}"    --epochs "${NUM_EPOCHS}"    --block-size "${BLOCK_SIZE}"    --warmup-ratio "${WARMUP_RATIO}"    --lora-r "${LORA_R}"    --lora-alpha "${LORA_ALPHA}"    --lora-dropout "${LORA_DROPOUT}"    --weight-decay "${WEIGHT_DECAY}"    $( [[ "$USE_FP16" == "True" ]] && echo "--mixed-precision fp16" )    $( [[ "$USE_PEFT" == "True" ]] && echo "--use-peft" )    $( [[ "$USE_INT4" == "True" ]] && echo "--quantization int4" )    $( [[ "$PUSH_TO_HUB" == "True" ]] && echo "--push-to-hub --token ${HF_TOKEN} --repo-id ${REPO_ID}" )

Make sure to change the data-path to where your training dataset is located.

7. Evaluating the model

Once your model has finished training, you should see a folder appear in your directory with the same title as your project name.

In my case, this folder is titled “mistralai,” as seen in the image below:

Mistral 7B-V0.2: Fine-Tuning Mistral’s New Open-Source LLM with Hugging Face

Within this folder, you can find files that encompass your model weights, hyperparameters, and architecture details.

Let’s now check whether this fine-tuned model is able to respond accurately to a question in our dataset. To achieve this, we first need to run the following lines of code to generate 5 sample inputs and outputs from our dataset:

# Print out 5 sample inputs and outputs from our dataset  for i, example in enumerate(dataset):     if i >= 5:          break     print(f"Instruction: {example['instruction']}")     print(f"Output: {example['output']}n---")

You should see a response that looks like this, showcasing 5 sample data points:

Instruction: Give three tips for staying healthy.  Output: 1.Eat a balanced diet and make sure to include plenty of fruits and vegetables.  2. Exercise regularly to keep your body active and strong.  3. Get enough sleep and maintain a consistent sleep schedule.  ---  Instruction: What are the three primary colors?  Output: The three primary colors are red, blue, and yellow.  ---  Instruction: Describe the structure of an atom.  Output: An atom is made up of a nucleus, which contains protons and neutrons, surrounded by electrons that travel in orbits around the nucleus. The protons and neutrons have a positive charge, while the electrons have a negative charge, resulting in an overall neutral atom. The number of each particle determines the atomic number and the type of atom.  ---  Instruction: How can we reduce air pollution?  Output: There are a number of ways to reduce air pollution, such as shifting to renewable energy sources, encouraging the use of public transportation, prohibiting the burning of fossil fuels, implementing policies to reduce emissions from industrial sources, and implementing vehicle emissions standards. Additionally, individuals can do their part to reduce air pollution by reducing car use, avoiding burning materials such as wood, and changing to energy efficient appliances.  ---  Instruction: Describe a time when you had to make a difficult decision.  Output: I had to make a difficult decision when I was working as a project manager at a construction company. I was in charge of a project that needed to be completed by a certain date in order to meet the client's expectations. However, due to unexpected delays, we were not able to meet the deadline and so I had to make a difficult decision. I decided to extend the deadline, but I had to stretch the team's resources even further and increase the budget. Although it was a risky decision, I ultimately decided to go ahead with it to ensure that the project was completed on time and that the client's expectations were met. The project was eventually successfully completed and this was seen as a testament to my leadership and decision-making abilities.

We are going to type one of the above instructions into the model and check if it generates accurate output. Here is a function to provide an instruction to the model and get a response from it:

# Function to provide an instruction  def ask(model, tokenizer, question, max_length=128):     inputs = tokenizer.encode(question, return_tensors='pt')     outputs = model.generate(inputs, max_length=max_length, num_return_sequences=1)     answer = tokenizer.decode(outputs[0], skip_special_tokens=True)     return answer

Finally, enter a question into this function as displayed below:

question = "Describe a time when you had to make a difficult decision."  answer = ask(model, tokenizer, question)  print(answer)

Your model should generate a response that is identical to its corresponding output in the training dataset, as displayed below:

Describe a time when you had to make a difficult decision.    What did you do? How did it turn out?    [/INST] I remember a time when I had to make a difficult decision about  my career. I had been working in the same job for several years and had  grown tired of it. I knew that I needed to make a change, but I was unsure of what to do. I weighed my options carefully and eventually decided to take a leap of faith and start my own business. It was a risky move, but it paid off in the end. I am now the owner of a successful business and

Please note that the response may seem incomplete or cut off because of the number of tokens we’ve specified. Feel free to adjust the “max_length” value to allow for a more extended response.

Fine-Tuning Mistral-7B V0.2 — Next Steps

If you’ve come this far, congratulations!

You have successfully fine-tuned a state-of-the-art language model, leveraging the power of Mistral 7B v-0.2 alongside Hugging Face’s capabilities.

But the journey doesn’t end here.

As a next step, I recommend experimenting with different datasets or tweaking certain training parameters to optimize model performance. Fine-tuning models on a larger scale will enhance their utility, so try experimenting with bigger datasets or varying formats, such as PDFs and text files.

Such experience becomes invaluable when working with real-world data in organizations, which is often messy and unstructured.

Natassha Selvaraj is a self-taught data scientist with a passion for writing. Natassha writes on everything data science-related, a true master of all data topics. You can connect with her on LinkedIn or check out her YouTube channel.

More On This Topic

  • How to Finetune Mistral AI 7B LLM with Hugging Face AutoTrain
  • How to Use Hugging Face AutoTrain to Fine-tune LLMs
  • Understanding BERT with Hugging Face
  • Training BPE, WordPiece, and Unigram Tokenizers from Scratch using…
  • Top 10 Machine Learning Demos: Hugging Face Spaces Edition
  • A community developing a Hugging Face for customer data modeling

Is Character AI Safe? Understanding Safety and Privacy Concerns

In the modern, fast-paced era, where the world depends on AI-driven decisions, trust is paramount. Character.AI, a rising star in conversational AI, tackles this very concern. It aims to transform digital interactions into genuine experiences while prioritizing user safety. According to DemandSage, its billion dollar valuation and 20 million strong user base speak volumes about Character.AI's innovative approach. But is Character.AI safe?

Committed to ethical and responsible AI development, Character.AI champions data privacy. It adheres to regulations and proactively addresses potential risks. positioning Character.AI as a leader in its field.

This blog will cover various aspects of Character.AI, exploring its features and addressing any lingering safety and privacy concerns associated with it.

What is Character.AI?

Character.AI is a neural language model conversational AI application that takes online interactions to a new level by letting its users chat with AI characters they create or encounter. These characters, which can be historical figures, celebrities, or even custom inventions, are built with advanced language processing so they can hold conversations that feel natural. Character.AI goes beyond the typical chatbot service by using deep learning to craft genuine digital interactions, making online experiences more engaging and authentic.

Features and Capabilities

Character.AI offers a range of features designed to make online interactions with AI-powered characters engaging and informative:

  • User-Created Chatbots: Character.AI empowers users to design and develop their own chatbots. These custom creations can be imbued with unique personalities, detailed backstories, and even customized appearances.
  • Interactive Storytelling: The platform transforms traditional storytelling by allowing users to embark on narrative adventures with their AI companions. This fosters a unique and engaging way to experience stories.
  • Personalized Learning Support: Character.AI caters to individual learning styles by offering personalized guidance and support through its AI tutors, enabling a more interactive and effective learning experience.
  • Curated Conversation Starters: Character.AI offers personalized suggestions to keep interactions with chatbots flowing and engaging.
  • User Safety Filters: A robust NSFW filter safeguards user privacy and ensures a secure environment for exploring the potential of conversational AI.

Character.AI Privacy Policy

Any AI-powered platform's privacy policy determines its credibility. Character.AI prioritizes user data protection through a robust privacy policy. The way it operates places an immense value on open data processing methods, guaranteeing user privacy and consent.

Character AI's privacy policy outlines how it collects user information, how it tracks their use of the app, and what information it might get from other sources like social media. This data is used to run the app smoothly, personalize user experience, and potentially for future advertising.

It's important to note that Character AI may share user information with affiliates, vendors, or for legal reasons. While users may have some control over their information by managing cookies or unsubscribing from emails, the platform may be storing their data in the US or other countries with varying privacy laws. By using Character AI, users consent to this transfer.

To prevent unwanted access to sensitive data, Character.AI regularly audits and imposes encryption measures. Moreover, Character.AI recently updated its privacy policy to incorporate enhanced security measures and transparency principles. These updates tackle growing privacy concerns and adhere to evolving regulatory standards.

Is Character.AI Safe?

Character.AI offers a fun and engaging platform with robust security mechanisms. However, like any AI technology, there are potential data privacy and security risks associated with its usage. Let's explore some of these risks:

Data Privacy Concerns

Character.AI collects a variety of user data, including names, emails, IP addresses, and even chat content. While they claim strong security measures, there's always a risk of data breaches or unauthorized access. For example, a hacker manages to infiltrate Character.AI's servers, gaining access to user data like names, emails, and potentially even chat logs containing private information. This information could be used for identity theft, targeted scams, or even blackmail.

Misuse of Personal Information

The Character AI privacy policy allows them to share user data with third parties under certain circumstances, like legal requirements or advertising purposes. This raises concerns about how user information might be used beyond the stated purposes. For instance, a user signs up for Character.AI and agrees to the privacy policy, unaware that under certain circumstances, their data could be shared with advertising companies. These companies then use the data to bombard the user with highly targeted ads, potentially revealing their interests or online behavior to others.

Deception and Scams

Malicious users could potentially create AI characters that impersonate real people or businesses. These characters could be used to spread misinformation, manipulate users, or even launch phishing attacks. For example, a malicious user creates an AI character that perfectly mimics a popular celebrity. The character interacts with fans, promising exclusive content or special treatment in exchange for personal information or financial contributions. Unsuspecting users might reveal private details or send money, only to find out they've been scammed.

Exposure to Inappropriate Content

While Character.AI has filters, they might not be perfect. Users, especially children, could be exposed to offensive or age-inappropriate content generated by AI characters or other users. For instance, despite content filters, a young user interacts with an AI character that starts generating sexually suggestive dialogue or violent imagery. This could be traumatizing for the user and expose them to inappropriate content not meant for their age group.

Over-reliance and Addiction

Character.AI's engaging nature could lead to excessive use or even addiction, potentially causing users to neglect real-world interactions. Consider a user struggling with social anxiety finds solace in interacting with AI characters on Character.AI. These interactions become so engaging and fulfilling that the user starts neglecting real-world relationships and responsibilities, potentially leading to social isolation and emotional dependence on the platform.

Staying Safe on Character.AI: Essential Tips for Responsible Use

While we've explored some potential security risks associated with Character.AI, it's important to remember that these risks can be mitigated with a proactive approach. By following some essential tips for responsible use, you can maximize your enjoyment of the platform while minimizing potential dangers. Here are some key strategies to keep in mind:

  • Be mindful of the information you share: Avoid sharing personal details or sensitive information with AI characters.
  • Review the privacy policy: Understand how your data is collected, used, and shared.
  • Report inappropriate content: Flag any offensive or harmful content you encounter.
  • Use Character AI responsibly: Maintain a healthy balance with real-world interactions.
  • Be cautious of unrealistic promises: Don't trust everything AI characters say, and verify information independently.

While Character.AI offers a glimpse into the future of AI interaction, its responsible use and a critical eye are essential for a safe and positive experience.

To stay updated on the latest developments in AI, visit Unite.ai.

‘Lord of the Rings’ Meets OpenAI’s ChatGPT & Google’s Gemini

‘Lord of the Rings’ Meets OpenAI’s ChatGPT & Google’s Gemini

Echoing the legendary ‘One Ring’ from Tolkien’s Middle-earth, VTouch, a South Korean Tech company, steps into the spotlight with the launch of the WIZPR ring. This innovative smart accessory is poised to revolutionise how we interact with AI.

The co-founders of VTouch, Seongjun (SJ) Kim and Nathan Dohyun Kim, have designed the ring to offer a streamlined and convenient way to access AI tools such as ChatGPT and Gemini.

Ready to shape the future

“With WIZPR ring’s voice-based interaction method, we are scripting a new chapter in human-AI interaction. This advancement enables computing even in situations where your hands are busy or screens are out of reach, taking human-computer interaction to the next level.

“Now without pulling out a smartphone, we can effortlessly talk to our chosen AI and multitask while driving, walking, jogging, or picking groceries,” said Nathan, in an introduction video explaining the working of the ring.

How the WIZPR Ring Works

Utilising the proximity voice activity detection technology, the WIZPR ring recognises and responds only to speech detected in close proximity, eliminating the need for wake words. By positioning the ring near the mouth, it automatically activates, enabling precise control over the initiation and encoding of commands.


When brought close to something else, like a pocket or when wearing gloves, it would detect so and automatically turn off to prevent accidental activation. Equipped with a built-in denounce sensor, the ring detects proximity and activates its microphone, transmitting the input to the user’s smartphone via Bluetooth.

The accompanying WIZPR ring app leverages Proximity Voice Activity Detection technology to accurately transcribe the user’s voice while effectively eliminating background noise, ensuring clarity in communication with the AI.

Responses from the AI are then audibly relayed through the user’s earphones and simultaneously displayed in the smartphone’s dialogue window, providing seamless interaction and integration with the device’s ecosystem.

Going beyond rings

The Humane Ai Pin also offers a wide range of functionalities, including contextual insights, virtual assistance, and personalised recommendations. By using voice commands or text inputs, users can interact with the Ai Pin, enabling hands-free operation and convenience.

Based on user interactions, the Ai Pin learns and adapts to individual preferences over time, delivering increasingly tailored experiences.

Another example of breakthrough devices is the Crossbeams Ignite Nexus smartwatch, which incorporates ChatGPT to enhance its capabilities beyond traditional smartwatches. With this integration, users can engage in conversational interactions, receive intelligent notifications, and access contextual information directly from their wrists.

The list goes on. Whoop Coach recently integrated AI-driven coaching capabilities to provide personalised fitness recommendations and coaching. Based on user data, including activity levels, sleep patterns, and recovery metrics, one can receive real-time insights and actionable suggestions to optimize their fitness routines, improve performance, and achieve their health goals.

Will these replace Siri, Alexa, and Google Assistant?

There is also a growing discussion regarding the possibility and potential of wearable AI replacing command and control systems like Siri (which is about to get a GenAI upgrade at WWDC 24) and Google Assistant.

As technology evolves, ambient computing is envisioned where devices will intuit users’ needs without requiring explicit wake words or commands. Innovations like the Humane AI Pin and WIZPR Ring are built to harness the power of AI and to interact with large language models.

In terms of spatial computing, launches like Vision Pro mark a significant milestone toward global adoption. Well, already in the race to further revolutionize communication, Neuralink will be as common as a smartphone, opening up a world of possibilities for both medical and technological advancements.

And now humanity anticipates embracing further advancements in AI-based communication.

The post ‘Lord of the Rings’ Meets OpenAI’s ChatGPT & Google’s Gemini appeared first on Analytics India Magazine.

Thanks to Google, YouTube Belongs to Everyone

Thanks to Google, YouTube Belongs to Everyone

According to a recent report, OpenAI trained GPT-4 using millions of hours of transcription of YouTube videos using its speech-to-text model Whisper. The company has been desperately trying to gather as much data as possible to make its AI models.

This report comes right after the recent interview of OpenAI CTO Mira Murati that was making rounds on the internet. In the video Murati looked tongue-tied and was unable to specify how the company trained its latest video generation model Sora. The company has been operating in the dangerous territory of AI copyright for quite some time now.

The problem here is that YouTube does not allow AI companies to download videos and transcripts. Neal Mohan, the CEO of YouTube, said using its videos for training AI models is a violation of the platform’s terms of services. Though Mohan couldn’t be sure if OpenAI had indeed used the videos. “It would be a violation,” he added.

“From a creator’s perspective, when they upload their hard work to our platform, they have certain expectations,” Mohan said in an interview. “Lots of creators have different sorts of licensing contracts in terms of their content on our platform,” Mohan said.

OpenAI is not alone

Meanwhile, Mohan claimed that Google too has used portions of YouTube videos to train the Gemini model, which he says adhered to the usage policy. Interestingly, the company tweaked its privacy policy’s language to expand what it could do with the data, which is quite shady.

It has been established several times over the year that YouTube is a gold mine of data for training any multimodal AI model. The rider here is that not everyone can use this data and train on YouTube videos except Google, which owns it.

The Times reported that OpenAI exhausted all useful text data in 2021 and has since been desperately trying to get its hands on any data possible. Though Murati said that Sora has been trained on publicly available data, it cannot be pinpointed if it was YouTube, Facebook, or Instagram, or all of them combined. But now, it has been confirmed that at least GPT-4 was trained on the transcripts.

Speaking of Facebook and Instagram, parent company Meta has had internal discussions in the previous year involving the potential acquisition of Simon & Schuster, a publishing house, with the aim to obtain longer-form content. This information was gleaned from recordings of internal meetings.

This is similar to OpenAI partnering with several news agencies. Google, on the other hand, believes that it has the right to scrape all the information off the internet being the dominant search engine. It recently partnered with Reddit for access to its Data API.

Even the entire universe of the internet is not enough for these data-hungry AI models.

‘Better to ask for forgiveness than permission’

Meta has also been in talks about the possibility of aggregating copyrighted content from various online sources, despite potential legal repercussions. The participants expressed concerns that negotiating licenses with publishers, artists, musicians, and news outlets would be time-consuming.

The requirement of data is so huge that even using copyrighted material after acquiring a license is not enough. “The only practical way for these tools to exist is if they can be trained on massive amounts of data without having to license that data,” Sy Damle, a lawyer who represents Andreessen Horowitz, said.

OpenAI CEO Sam Altman has been quite vocal about the need for data for AI models and that training would use up all the available data on the internet. This eventually ended up with the company transcribing YouTube videos such as audiobook and podcasts for high-quality data and information.

Several Google employees are aware that OpenAI used YouTube videos to train its AI models but did not voice it out as Google was also doing the same. It would have been hypocritical for the company to do it. So the future is simple, either no one, or everyone would be using YouTube videos to train AI models.

What this would do to the creators is still a question. Altman has clearly said that he wants to compensate the artists and creators, but the process isn’t clear to him as well. For now, it is all about training on YouTube’s gold mine of data, and then paying off hefty fines (if and when imposed).

But now that YouTube’s data is already exhausted on GPT-4 and Gemini, we wonder what these companies would train their upcoming models such as GPT-5 on. They would find a way – legal or illegal – and figure it out later.

The post Thanks to Google, YouTube Belongs to Everyone appeared first on Analytics India Magazine.

AI Can Help Replace FASTags at Toll Booths: NPCI CTO

The National Payments Corporation of India (NPCI) was at the heart of the widespread adoption and success Unified Payments Interface (UPI) had in India. Now, NPCI is experimenting with AI to further enhance the payment ecosystem in India.

“NPCI is exploring futuristic use cases for payment operators focusing on technologies like graph analysis, speech recognition, ANPR (Automatic Number Plate Recognition) as well as generative AI,” Vishal Kanvaty, chief technology officer at NPCI, told AIM in an exclusive interaction.

The most intriguing among them is using ANPR, which typically involves a combination of computer vision and machine learning techniques.

“Leveraging ANPR technology at toll booths and parking facilities enables seamless and contactless payments by automatically recognising vehicle licence plates and debiting the corresponding payment accounts,” Kanvaty said.

This could help overcome the limitations currently prevalent with FASTags in India. Recently, Paytm users were alarmed when the RBI initiated significant measures against Paytm Payments Bank (PPBL), instructing the company to cease accepting deposits or top-ups in any customer accounts, including wallets and FASTags. Using ANPR at toll booths could help avoid such a scenario altogether.

Exploring futuristic use cases for payment operators

“By employing graph analysis techniques, NPCI strives to uncover and deter money laundering and other illicit activities by recognising intricate patterns and connections within transaction networks.

“By integrating behavioural biometrics such as keystroke dynamics, mouse movements, and touchscreen interactions enhances user authentication by continuously verifying identity based on unique behavioural patterns, reducing reliance on traditional authentication methods like passwords and PINs,” Kanvaty said.

Employing predictive analytics powered by machine learning enables payment operators to analyse customer behaviour and preferences in real-time, allowing for dynamic pricing strategies and personalised offers tailored to individual users.

According to ​​Kanvaty, payment operators like NPCI can establish a collaborative platform, where fintechs and other stakeholders can co-create and share AI models. This can be done for various use cases such as risk assessment, credit scoring, customer segmentation, fostering innovation and accelerating the adoption of AI in the payments industry.

Conversational payments

Last year, RBI launched ‘Hello UPI’, introducing voice-based payment capabilities, which went live on the BHIM app.

Hello UPI offers an AI-powered user-friendly interface that doesn’t require literacy or advanced digital skills, making financial services more accessible to underserved communities, including those in rural areas or with limited access to traditional banking infrastructure.

“By enabling payments through simple voice commands, AI-powered solutions lower the barrier to entry for individuals who may be intimidated by or unfamiliar with traditional payment methods. This empowers more people to participate in the formal financial system and access a wide range of financial services,” Kanvaty said.

This initiative enables a conversational interface for completing transactions, available both through the app and via 123Pay, a service originally intended for audio/IVR payment interactions. This integration empowers advanced speech-based AI platforms developed by NPCI to significantly expand payment accessibility beyond conventional methods.

Speech-recognition engineering pipeline

“NPCI’s speech-recognition engineering pipeline is a state-of-the-art high-performance system that enables us to orchestrate several chains of models per language. This is entirely built on the open-source tech stack and highly configurable. We can change the models quickly, and scale out quickly,” Kanvaty said.

Building a pipeline from scratch using open source principles provides maximum flexibility and allows NPCI to architect without vendor lock-ins and without assuming cloud speech APIs being available.

Moreover, it enables them to figure out the right chain of models that do the job thoroughly, and experiment and finetune without any constraints or dependencies.

“We are equipped to deploy an effective mix of models that not only perform tasks efficiently but also maintain low latency. The effectiveness of our voice system is realised when it is fully embraced and utilised by the payment ecosystem.

“To facilitate this, we are prepared to offer the requisite thought leadership and guidance. We are also actively coordinating with our ecosystem partners to ensure this integration is successful, ” he said.

Collaboration with Bhashini

To bring AI-powered conversational capabilities in the 22 official languages of India, NPCI is working closely with Bhashini, a government of India initiative aiming to bridge linguistic disparities in India.

“The voice systems use a variant of the BERT-like system adapted for Indian languages for aspects like intent recognition and entity recognition. BERT is one of the earliest large-scale Transformer Encoder models to have emerged on the scene.

“The Bhashini and AI4Bharat teams have been our key partners in collaborating with us on this exercise. They have been instrumental in working closely with us and guiding to the India-made Indian language speech models that can be used for this exercise,” Kanvaty said.

Bhashini’s translation models such as IndicTrans, developed in association with AI4Bharat, an initiative of IIT Madras, are already being used by government and private institutions. Most notably, Prime Minister Narendra Modi’s speech at the Kashi Tamil Sangamam in Varanasi was translated in real-time to Tamil using Bhashini’s translation models.

However, NPCI has not disclosed a timeline for the availability of this feature to consumers, as it remains a work in progress.

“This partnership works both ways, as it also validates the usage of these models for a large-scale exercise, and proves that they are ready for prime time. We push the models to the fullest and tweak them so that they are production ready,” Kanvaty explained.

Testing generative AI

Besides automating customer service interactions through chatbots and responding to customer queries in multiple languages, LLMs can be leveraged to identify suspicious patterns or anomalies indicative of fraudulent activities, according to Kanvaty.

“This can help payment systems detect and prevent fraud more effectively. By analysing user transaction history, spending patterns, and demographic information, LLMs can generate personalised recommendations for financial products, services, and offers.

“This can help payment systems increase customer engagement and loyalty,” he added.

Moreover, LLMs can facilitate language translation services within payment systems, allowing users to interact in their preferred language and enabling seamless communication between customers and service providers.

“LLMs can also assist payment systems in interpreting and adhering to regulatory requirements by analysing legal documents, compliance guidelines, and industry standards. This can help ensure that payment systems operate within the boundaries of applicable laws and regulations,” Kanvaty concluded.

The post AI Can Help Replace FASTags at Toll Booths: NPCI CTO appeared first on Analytics India Magazine.

Tata Places India’s First Military-Grade Spy Satellite in Orbit

India’s Tata Advanced Systems Ltd (TASL) has achieved a significant milestone in satellite technology. The TSAT-1A satellite was successfully separated and inserted into a seamless orbit.

The collaboration between TASL and Satellogic, announced in November 2023, has borne fruit with the launch of a high-resolution Earth observation satellite tailored for Indian defence forces. This joint endeavour signifies TASL’s foray into the satellite domain and Satellogic’s expansion into India’s burgeoning defence and commercial sectors.

Under this partnership, TASL and Satellogic aim to foster local space technology capabilities, beginning with comprehensive training and knowledge transfer. Establishing a satellite Assembly, Integration, and Test (AIT) plant at TASL’s Vemagal facility in Karnataka underscores the commitment to indigenous satellite manufacturing.

The collaboration extends beyond satellite manufacturing to developing a new satellite design, emphasising the integration of multiple payloads to cater to diverse data needs across India. Emiliano Kargieman, CEO and Founder of Satellogic, hailed this partnership as a pivotal step in advancing commercial space capabilities, facilitating greater access to critical information for various applications, including security, sustainability, and energy.

This achievement aligns with India’s broader space ambitions, as evidenced by the impending launch of India’s first spy satellite developed by a domestic private player. Built by TASL, the satellite launched aboard a SpaceX rocket promises discreet information acquisition capabilities for the armed forces.

Furthermore, efforts are underway to establish a ground control centre in Bengaluru, in collaboration with Satellogic, to facilitate guidance and image processing. This development underscores India’s quest for self-reliance in satellite technology, reducing reliance on foreign vendors for crucial data.

While India’s space endeavours have historically leaned on partnerships and collaborations, recent strides indicate a growing momentum towards indigenous capabilities. With the launch of the TASL satellite and ongoing initiatives by organisations like the Indian Space Research Organisation (ISRO), India is poised to assert itself as a key player in the global space arena, catering to both strategic and commercial interests.

The post Tata Places India’s First Military-Grade Spy Satellite in Orbit appeared first on Analytics India Magazine.

Rakuten Certified as Best Firm for Data Scientists for the 2nd Time

Rakuten has once again been certified as the Best Firm for Data Scientists to work for by Analytics India Magazine (AIM) through its workplace recognition programme.

The Best Firm For Data Scientists certification surveys a company’s data scientists and analytics employees to identify and recognise organisations with great company cultures. AIM analyses the survey data to gauge the employees’ approval ratings and uncover actionable insights.

“I extend my deepest gratitude to our exceptional team of Data and AI professionals whose dedication and brilliance have led us to this recognition. With a culture fueled by innovation, usage of cutting-edge technology, collaboration and strong business communication, we’re proud to be the premier destination where AI talent thrives and revolutions begin”, said Anirban Nandi, Head of AI Products & Analytics (Vice President) at Rakuten India.

The analytics industry currently faces a talent crunch, and attracting good employees is one of the most pressing challenges that enterprises are facing.

The certification by Analytics India Magazine is considered a gold standard in identifying the best data science workplaces and companies participate in the programme to increase brand awareness and attract talent.

Best Firms for Data Scientists is the biggest data science workplace recognition programme in India. To nominate your organisation for the certification, please fill out the form here.

The post Rakuten Certified as Best Firm for Data Scientists for the 2nd Time appeared first on Analytics India Magazine.

CPG 2024: Cutting Costs and Embracing AI Solutions

In 2023, the top 10 Consumer Packaged Goods (CPG) companies saw growth driven primarily by price hikes while product volume stagnated. With inflation easing in 2024, the focus shifts to cost-cutting, especially in supply chain management and inventory optimization.

“When it comes to CPGs, the biggest cost is the inventory cost. Any CPG, at any given point in time, carries at least two to four months of inventory. Any non-performing in this is a big challenge for them,” said Akshay Panchariya, senior manager, supply chain, Tredence, in an exclusive interview with AIM.

“Reducing waste is key because wasted products mean lost money. It is important for CPG companies to continuously evolve to reduce their wastage, it impacts the bottom line” he added.

According to Panchariya, a three-fold solution to these problems involves AI-driven recommendations to minimize finished goods wastage, optimizing real-time raw material use, and enhanced inventory projections for better collaborative decision-making. ” CPGs must embrace AI and ML solutions to get ready for the future, make smarter moves, and ensure adaptive strategies,” he said.

Embracing AI-ML for Future-Ready CPG Operations

“With the wealth of data available on supply and demand, AI solutions can simplify decision-making processes for CPGs by considering variable such as shipping costs, inventory risk, assessing best-case scenarios, and implementing optimal strategies to minimise waste to minmise wastage while ensuring high customer service levels,” said Panchariya.

He added that flexibility is required in inventory management, with decisions varying from weekly to daily based on real-time data availability. “For instance when for weekly decision making, you can employ open-source algorithms like Python PuLP to create the optimisation model,” he said, adding it is primarily a linear programming model.

“If you want something in near real-time, you might want to use advanced algorithms like CPLEX and Gurobi, for their speed and efficiency in decision-making,” he said, explaining that all these algorithms can be connected to the external source data.

He further highlighted the importance of incorporating external data sources, you are able to consider current market conditions. “External data allows you to assess unforeseen disruptions by looking at news articles and using them as input variables in your optimization model to understand how it will impact you.”

For instance, “With the pirate attacks in the Red Sea area, you need to consider all these factors of ships likely being targeted, etc, when doing an comprehensive inventory optimisation with these advanced values,” he explained.

Addressing SKU Complexity and Consumer Demand Fluctuations

“There’s a lot of unstructured data scattered across various parts of an organisation. From raw materials and production data to supplier data; data related to the inventory, and data concerning customer preferences,” said Panchariya, adding that all this data needs to be in a structured format.

For instance, consider a scenario where a single SKU (stock-keeping unit) is manufactured weekly across two plants. This results in a substantial amount of data – over 100 batches within a year.

Now, extend this complexity to a scale involving 10,000 SKUs across various networks. The challenge of managing inventory for each SKU, considering production schedules, locations, and diverse manufacturing dates, becomes staggering.

Panchariya explained that this is where expertise from Tredence comes into play, “Tredence has pre-built data models for CPGs to feed into the algorithms efficiently for facilitating correct output for decision making as per business needs.” he said.

Advantages of AI-ML Inventory Projections vs Traditional Forecasting

Panchariya classifies AI technology into three categories based on evolution—AI assistance, AI-powered, and Autonomous. “It was an AI assistant a few years ago; now it has become AI-powered, and three years from now, it could become Autonomous,” he said.

“Right now, we are at the AI-powered stage. It makes sure that these AI models enable decision-making, but humans are still the ones to take the final call ” he explained.

“Multi-billion-dollar businesses would never just rely just on AI models to make decisions. AI models power them to make the decision,” he said, explaining that supply chain decision-making is now enabled with AI models. It’s about enabling better decisions, not replacing human judgement.

Challenges

Panchariya also addresses that CPG companies need to ensure that AI models do not hallucinate and remain grounded. “Augmenting models like ChatGPT with organisational data and controlling the input into these GPTs is critical for ensuring data security avoiding any unethical practices,” he said.

Final Word/ Conclusion

In conclusion, Panchariya stresses the urgency of adopting these AI-driven solutions to meet 2024’s cost pressures. “For CPG companies, the goal is clear: reduce inventory waste to meet shareholder expectations of cost reduction. Embracing AI and ML is not just a strategy—it’s the need of the hour”

The post CPG 2024: Cutting Costs and Embracing AI Solutions appeared first on Analytics India Magazine.

Spotify launches personalized AI playlists that you can build using prompts

Spotify launches personalized AI playlists that you can build using prompts Sarah Perez @sarahintampa / 8 hours

Spotify already found success with its popular AI DJ feature, and now the streaming music service is bringing AI to playlist creation. The company on Monday introduced AI playlists into beta, a new option that allows users to generate a playlist based on written prompts.

The feature will initially become available to users on Android and iOS devices in the U.K. and Australia and will be iterated on in the months ahead.

In addition to more standard playlist creation requests, like those based on genre or time frame, Spotify’s use of AI means people could ask for a wider variety of custom playlists, like “songs to serenade my cat” or “beats to battle a zombie apocalypse,” Spotify suggests. Prompts can reference all sorts of things, like places, animals, activities, movie characters, colors or emojis. The company notes that the best playlists are generated using prompts that contain a combination of genres, moods, artists and decades, however.

Spotify also leverages its understanding of users’ tastes to customize the playlists it makes with the feature.

After the playlist is generated, users can then use the AI to revise and refine the end result by issuing commands like “less upbeat” or “more pop,” for example. Users can also swipe left on any songs to remove them from the playlist.

In terms of the technology, Spotify says it’s using large language models (LLMs) to understand the user’s intent. Then, Spotify uses its personalization technology — the information it has about the listener’s history and preferences — to fulfill the prompt and create a personalized AI-generated playlist for the user.

The company uses a range of third-party tools for its AI and machine learning experiences.

TechCrunch first reported in October 2023 that Spotify was developing AI playlists, when reverse engineers Chris Messina and Alessandro Paluzzi shared screenshots of code from Spotify’s app that referred to AI playlists that were “based on your prompts.”

Spotify at the time declined to comment on the finding, saying it would not offer a statement on possible new features. However, in December 2023, the company confirmed that it was testing AI-driven playlist creation after a TikTok video of the feature surfaced showing what the Spotify user described as “Spotify’s ChatGPT.”

The feature is found in the “Your Library” tab in Spotify’s app by tapping on the plus button (+) at the top right of the screen. A pop-up menu appears showing the AI Playlist as a new option alongside the existing “Playlist” and “Blend” options.

If a listener can’t think of any prompts to try, Spotify offers prompt suggestions to help people get started, like “get focused at work with instrumental electronica,” “fill in the silence with background café music,” “get pumped up with fun, upbeat, and positive songs” or “explore a niche genre like Witch House” and many others.

To save an AI playlist, tap the “Create” button to add it to the library.

The company notes the AI has guardrails around it so it will not respond to offensive prompts or those focused on current events or specific brands.

Spotify has been investing in AI technology to improve its streaming service for many months. With the launch of AI DJ, which expanded globally last year, the company used a combination of Sonantic and OpenAI technology to create an artificial version of the voice of Spotify’s head of cultural partnerships, Xavier “X” Jernigan, who introduces personalized song selections to the user. Last year, Spotify said it was investing in in-house research to better understand the latest in AI and large language models.

CEO Daniel Ek has also teased to investors other ways Spotify could leverage AI, including by summarizing podcasts, creating AI-generated audio ads, and more. The company has also looked into using AI tech that would clone a podcast host’s voice for host-read ads.

Ahead of AI playlists, Spotify launched a similar feature, Niche Mixes, that allowed users to create personalized playlists using prompts, but the product did not leverage AI technology and was more limited in terms of its language understanding.

Spotify spotted developing AI-generated playlists created with prompts