Introducing Falcon2: Next-Gen Language Model by TII

Falcon2
Image by Author

The Technology Innovation Institute (TII) in Abu Dhabi released its next series of Falcon language models on May 14. The new models match the TII mission as technology enablers and are available as open-source models on HuggingFace. They released two variants of the Falcon 2 models: Falcon-2-11B and Falcon-2-11B-VLM. The new VLM model promises exceptional multi-model compatibilities that perform on par with other open-source and closed-source models.

Model Features and Performance

The recent Falcon-2 language model has 11 billion parameters and is trained on 5.5 trillion tokens from the falcon-refinedweb dataset. The newer, more efficient models compete well against the Meta’s recent Llama3 model with 8 billion parameters. The results are summarized in the below table shared by TII:

Falcon 2 Results
Image by TII

In addition, the Falcon-2 model fares well against Google’s Gemma with 7 billion parameters. Gemma-7B outperforms the Falcon-2 average performance by only 0.01. In addition, the model is multi-lingual, trained on commonly used languages inclduing English, French, Spanish and German amongst others.

However, the groundbreaking achievement is the release of Falcon-2-11B Vision Language Model that adds image understanding and multi-modularity to the same language model. The image-to-text conversation capability with comparable capabilities with recent models like Llama3 and Gemma is a significant advancement.

How to Use the Models for Inference

Let’s get to the coding part so we can run the model on our local system and generate responses. First, like any other project, let us set up a fresh environment to avoid dependency conflicts. Given the model is released recently, we will the need the latest versions of all libraries to avoid missing support and pipelines.

Create a new Python virtual environment and activate it using the below commands:

python -m venv venv  source venv/bin/activate

Now we have a clean environment, we can install our required libraries and dependencies using Python package manager. For this project, we will use images available on the internet and load them in Python. The requests and Pillow library are suitable for this purpose. Moreover, for loading the model, we will you use the transformers library that has internal support for HuggingFace model loading and inference. We will use bitsandbytes, PyTorch and accelerate as a model loading utility and quantization.

To ease up the set up process, we can create a simple requirements text file as follows:

# requirements.txt  accelerate  # For distributed loading  bitsandbytes	# For Quantization  torch   # Used by HuggingFace  transformers	# To load pipelines and models  Pillow  # Basic Loading and Image Processing  requests	# Downloading image from URL  

We can now install all the dependencies in a single line using:

pip install -r requirements.txt

We can now start working on our code to use the model for inference. Let’s start by loading the model in our local system. The model is available on HuggingFace and the total size exceeds 20GB of memory. We can not load the model in consumer grade GPUs which usually have around 8-16GB RAM. Hence, we will need to quantize the model i.e. we will load the model in 4-bit floating point numbers instead of the usual 32-bit precision to decrease the memory requirements.

The bitsandbytes library provides an easy interface for quantization of Large Language Models in HuggingFace. We can initalize a quantization configuration that can be passed to the model. HuggingFace internally handles all required operations and sets the correct precision and adjustments for us. The config can be set as follows:

from transformers import BitsAndBytesConfig  quantization_config = BitsAndBytesConfig(      load_in_4bit=True,      bnb_4bit_quant_type="nf4",    	# Original model support BFloat16      bnb_4bit_compute_dtype=torch.bfloat16,  )  

This allows the model to fit in under 16GB GPU RAM, making it easier to load the model without offloading and distribution. We can now load the Falcon-2B-VLM. Being a multi-modal model, we will be handling images alongside textual prompts. The LLava model and pipelines are designed for this purpose as they allow CLIP-based image embeddings to be projected to language model inputs. The transformers library has built-in Llava model processors and pipelines. We can then load the model as below:

from transformers import LlavaNextForConditionalGeneration, LlavaNextProcessor  processor = LlavaNextProcessor.from_pretrained(  	"tiiuae/falcon-11B-vlm",  	tokenizer_class='PreTrainedTokenizerFast'  )  model = LlavaNextForConditionalGeneration.from_pretrained(  	"tiiuae/falcon-11B-vlm",  	quantization_config=quantization_config,  	device_map="auto"  )  

We pass the model url from the HuggingFace model card to the processor and generator. We also pass the bitsandbytes quantization config to the generative model, so it will be automatically loaded in 4-bit precision.

We can now start using the model to generate responses! To explore the multi-modal nature of Falcon-11B, we will need to load an image in Python. For a test sample, let us load this standard image available here. To load an image from a web URL, we can use the Pillow and requests library as below:

from Pillow import Image  import requests    url = "https://static.theprint.in/wp-content/uploads/2020/07/football.jpg"  img = Image.open(requests.get(url, stream=True).raw)

The requests library downloads the image from the URL, and the Pillow library can read the image from bytes to a standard image format. Now that can have our test image, we can now generate a sample response from our model.

Let’s set up a sample prompt template that the model is sensitive to.

instruction = "Write a long paragraph about this picture."  prompt = f"""User:<image>n{instruction} Falcon:"""

The prompt template itself is self-explanatory and we need to follow it for best responses from the VLM. We pass the prompt and the image to the Llava image processor. It internally uses CLIP to create a combined embedding of the image and the prompt.

inputs = processor(  	prompt,  	images=img,  	return_tensors="pt",  	padding=True  ).to('cuda:0')  

The returned tensor embedding acts as an input for the generative model. We pass the embeddings and the transformer-based Falcon-11B model generates a textual response based on the image and instruction provided originally.

We can generate the response using the below code:

output = model.generate(**inputs, max_new_tokens=256)  generated_captions = processor.decode(output[0], skip_special_tokens=True).strip()

There we have it! The generated_captions variable is a string that contains the generated response from the model.

Results

We tested various images using the above code and the responses for some of them are summarized in this image below. We see that the Falcon-2 model has a strong understanding of the image and generates legible answers to show its comprehension of the scenarios in the images. It can read text and also highlights the global information as a whole. To summarize, the model has excellent capabilities for visual tasks, and can be used for image-based conversations.

Falcon 2 Inference Results
Image by Author| Inference images from the Internet. Sources: Cats Image, Card Image, Football Image

License and Compliance

In addition to being open-source, the models are released with the Apache2.0 License making them available for Open Access. This allows the modification and distribution of the model for personal and commercial uses. This means that you can now use Falcon-2 models to supercharge your LLM-based applications and open-source models to provide multi-modal capabilities for your users.

Wrapping Up

Overall, the new Falcon-2 models show promising results. But that is not all! TII is already working on the next iteration to further push performance. They look to integrate the Mixture-of-Experts (MoE) and other machine learning capabilities into their models to improve accuracy and intelligence. If Falcon-2 seems like an improvement, be ready for their next announcement.

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook "Maximizing Productivity with ChatGPT". As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She's also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.

More On This Topic

  • Introducing the Testing Library for Natural Language Processing
  • Introducing Healthcare-Specific Large Language Models from John Snow Labs
  • Introducing TPU v4: Googles Cutting Edge Supercomputer for Large…
  • Machine Learning Model Development and Model Operations: Principles…
  • Segment Anything Model: Foundation Model for Image Segmentation
  • The Ultimate Open-Source Large Language Model Ecosystem

I tested this AI-powered pet tracker — and discovered the joy of daily dog activity reports

Invoxia Minitailz on Dog Collar

ZDNET's key takeaways

  • Invoxia's Minitailz Health and GPS Tracker retails for $99 on Amazon or the company website. An additional subscription cost is required.
  • The tracker helps eliminate the guesswork about your dog's whereabouts by tracking its location, activity, and even some biometrics.
  • The subscription cost to use the tracker is more expensive than the actual hardware, making it a long-term investment.

A year ago, I adopted my first dog, a Yorkshire Terrier named Jimmy. Little did I know that with the cute face and floppy years also came loads of parenting anxiety, made real by a $2,000 emergency vet bill. As a result, when I saw the Minitailz Health & GPS Tracker at CES 2024, I had to have it.

Also: 6 ways Apple can leapfrog OpenAI, Microsoft, and Google at WWDC 2024

Leveraging AI and other advanced tech, the gadget tracks your dog's location, activity, and biometrics, such as walking, playing, running, sleeping, and eating times and heart and respiratory resting rates. These capabilities earned it the "best innovation in the AI category" recognition at CES.

View at Amazon

In the box, you get Minitailz, a USB-C charger, and the ring to attach it to your dog's collar. To place it onto your dog's collar, you just slip it right through the opening. Then, just download the app, create an account, select a subscription of either $129.95 for one year or $229.95 for two years, and you are ready to go.

Also: Everything you need for a smart pet setup

To get the most out of your Minitailz, you need to let it gather as much information about your dog as possible, so it should be kept on your dog all day and night. To ensure that my experience with the gadget was as accurate and optimized as possible, I kept it on Jimmy's collar for over three months, and the information it gathered in that time was impressive.

As advertised, it presents all the health and activity insights in a way that is easy to read, access, and understand, as shown in the photos below. However, even though it's nice to have access to all this data, if I am being honest, I rarely look at it. My favorite feature of the app is the daily reports.

The reports present all the data collected in the past 24 hours in a fun, comprehensive way, adding necessary context where it can be helpful, such as comparing your dog's vitals to those of others in the Minitailz community. This is especially useful because, according to the company, a higher-than-normal resting respiratory rate can indicate impending heart failure, so having these insights into Jimmy's health gives me peace of mind.

As a crazy dog mom, this feature alone is worth the investment because it helps bridge the communication gap with my four-legged friend. It shows me things Jimmy can't express, such as whether he had enough playtime, walks, exercise, and a good night's sleep.

The second best feature is the activity notifications. When I write stories from the office, my partner works from home and cares for Jimmy. Instead of feeling like I am missing out, I get notifications on my phone that Jimmy went on a walk, got in a car, or is currently playing. This feature would be especially useful if you leave your pet at a doggy daycare or with a sitter to ensure everything is going according to schedule.

The only caveat is that sometimes it cannot accurately decipher whether Jimmy has the zoomies or is on an actual walk. This caused my partner to call me panicked once, thinking Jimmy had broken out of the apartment when, in reality, he was running around the living room playing with his toys.

Also: Exclusive interview with Raspberry Pi CEO: New $70 AI kit 'a watershed moment for us'

Another great feature for concerned dog parents is the GPS feature, which tracks the pet's location. Compared to the Apple Air Tag, it refreshes less quickly. However, the advantage is that it won't start beeping if you are away for too long. Another perk is that, unlike the AirTag, it is rechargeable, ensuring you control how much battery it has instead of just dying randomly.

The Minitailz's fantastic battery life is also a highlight. It was advertised as lasting two whole weeks, and in my experience, that is accurate. My favorite part is that it notifies you via emails and phone pushes that your pup's battery is running low, making it impossible to miss. I also shared my account login with my partner, which doubles the amount of people who have access to all of Jimmy's insights and needs.

ZDNET's buying advice

If you're a data aficionado and, as a result, a fan of wearables, this gadget is for you and your pet. With the Minitailz, you can access similar insights you get on your smartwatch or fitness tracker for your furry friend. Ultimately, this data helped me learn more about my pet's needs through the comprehensive health data. However, if you don't see the value of having a ton of metrics that aren't necessarily actionable because of the steep subscription cost, I would skip it.

Featured reviews

Could AI have Prevented the Exit Poll Mess in India?

Could AI have Prevented the Exit Poll Mess in India

As the vote counting for the grand 2024 Indian Lok Sabha elections began on June 4, many people resorted to social media platforms like X to express their dissatisfaction with the exit poll results, calling out their inaccuracies and calling for more reliable methods.

Various pollsters predicted the incumbent National Democratic Alliance (NDA) would secure 350-400 seats. However, the alliance only managed to secure 293 seats, with the BJP winning 240 seats.

With exit polls having strayed way off the mark even in the past, could AI emerge as a potential game-changer here?

Intuit WDSW June 2024

Register Here

In Comes AI

“Instead of directly questioning individuals—which can introduce social desirability bias—we extrapolate their opinions from their online interaction. This method minimises bias and eliminates the need for lengthy and tedious interviews,” said Matteo Serafino, chief data scientist at KCore Analytics, in an exclusive interaction with AIM.

The data research company predicted voter preferences using AI collected from people’s online activities on social media—what they were reading, writing, and reacting to.

This data, collected in real-time, was then analysed using AI algorithms that take into account various factors that could affect elections, such as inflation, thereby providing more accurate predictions.

Streamlining Data

“We compile a basket of users with identified preferences, akin to a sample in traditional polling. This data is then integrated with the macroeconomic and historical data through a reweighting process, leading to our final insights. Crucially, this is all done while preserving user privacy,” said Serafino.

KCore converts unstructured input, including text, audio, and images, into structured data for analysis using techniques from network theory, natural language processing (NLP), and computer vision.

It employed Graph Neural Networks (GNN) for predictions and Bidirectional Encoder Representations from Transformers (BERT) for sentiment analysis.

It’s Not New Though

Numerous AI startups have already developed models to forecast elections.

Expert.ai, a software startup specialising in natural language processing, employed AI to examine social media remarks regarding Donald Trump and Joe Biden in the months leading up to the 2020 US elections.

The company’s AI interprets the emotions conveyed in social media posts and predicts how these will translate into votes. Using NLP, it classifies the attitude expressed in posts using over 80 distinct emotional categories.

Another AI company, Unanimous.ai, used its programme to survey people in the United States in September 2020. It united vast groups of individuals via the internet, forming a “swarm intelligence” that magnified the members’ collective knowledge and ideas.

Unanimous.ai correctly predicted the presidential election victor in 11 states.

Outdated Traditional Methods

In a typical exit poll, voters are interviewed as they leave the building after voting. Surveyors are trained and stationed at polling booths, and data is traditionally collected using pen and paper (now digitally).

However, the accuracy of the results can vary depending on many factors. These include sample size, demographic representation, structured questionnaires, random telephone or in-person interviews for fairness, and data compilation in a timely manner.

Thus, results can be distorted.

Yeshwant Deshmukh, the founder of C-Voter, one of India’s major polling organisations, identified sample sizes and limited resources as problems. He claims that polling in India is as complex as polling in a diverse region like the European Union, but “pollsters don’t have that kind of budget”.

With such challenges, AI-driven exit polls are the key to having close-to-accurate results. “In the future, traditional pollsters will integrate AI algorithms with their existing data. Given the continuous decline in response rates for traditional surveys, a gradual shift is anticipated, although the current industry mindset may resist such change,” said Serafino.

The post Could AI have Prevented the Exit Poll Mess in India? appeared first on AIM.

4 major iPadOS 18 features announced at WWDC 2024 (and which iPads will get it)

iPad Air (2022, 6th generation)

iOS tends to get the most attention during Apple's Worldwide Developers Conference (WWDC) each year, but iPadOS doesn't trail far behind. And with new iPad models released just last month, it's safe to assume the tablet's new operating system is eagerly awaited.

Also: Everything Apple announced at WWDC 2024, including iOS 18, Siri, AI, and more

This year, Apple upgraded each of its operating systems significantly, introducing a host of artificial intelligence (AI) features that the company is calling "Apple Intelligence." AI aside, we have "Assansin's Creed: Shadows" coming to the iPad, a new Calculator app, Smart Script, and more features for iPadOS 18.

The best iPadOS 18 features announced at WWDC

Featured

LlamaGen Beats Diffusion Models for Scalable Image Generation

Meta Partners with Microsoft to Release LLaMA-2 for Commercial Use

The University of Hong Kong and ByteDance have unveiled LlamaGen, a new family of autoregressive models that outperform popular diffusion models like LDM and DiT for high-resolution image generation.

The key breakthrough is that LlamaGen applies the same “next-token prediction” paradigm used in large language models to the visual domain without relying on inductive biases tailored for vision.

The LlamaGen models range from 111 million to 3.1 billion parameters and achieve an impressive 2.18 FID score on the challenging ImageNet 256×256 benchmark, surpassing state-of-the-art diffusion models. For class-conditional image generation, LlamaGen-3B realises 2.32 FID with classifier-free guidance at 1.75 scale.

Read the full paper here.

Training Method

Notably, the researchers developed an image tokeniser with a downsampling ratio of 16 that achieves 0.94 reconstruction FID and 97% codebook usage on ImageNet. This discrete representation matches the quality of continuous VAE representations used in diffusion models.

For text-conditional generation, a 775M parameter LlamaGen model was first trained on 50M image-text pairs from LAION-COCO, then fine-tuned on 10M high-quality images. It demonstrates the competitive visual quality and text alignment on challenging prompts from datasets like PartiPrompts.

A key advantage of LlamaGen is its ability to leverage optimisation techniques developed for large language models. The researchers showed a 326-414% speedup using the vLLM serving framework compared to baseline settings.

While still behind the latest diffusion models on some metrics, the researchers believe LlamaGen paves the way for unified autoregressive models spanning language and vision. With more training data and computing, they aim to scale LlamaGen above 7B parameters for further gains.

Up Next

OpenAI’s Sora was released earlier this year, and with Google recently releasing Veo, text-to-video AI models are now gaining prominence.

As these improved capabilities demonstrate that image generation can become faster and more accurate, they can also be applied to open-source video generation models, putting them on par with video-generation models like Sora and Veo.

The post LlamaGen Beats Diffusion Models for Scalable Image Generation appeared first on AIM.

5 Free Competitions for Aspiring Data Scientists

5 Free Competitions for Aspiring Data ScientistImage by Editor | Midjourney & Canva

Data science is like art, as many ways exist to solve problems. That’s why data science competition exists to acquire the best way to crack data science issues.

I have seen some aspiring data scientist careers launch because of data science competitions. These competitions showed that the participants could solve the problem and were creative. Moreover, competition also allows you to network and learn from your peers.

Data Science competitions are a fun way to increase our skills while giving us an edge over other aspirants. In this article, I will explain five free data science competitions you can join now.

Are you curious? Let’s get into it.

Kaggle Competitions

Kaggle is an online platform and community designed for data scientists. It offers many features, including public dataset sharing for analysis and data projects, free tutorial learning, and a competition platform.

The competition platform is one of the most popular places for data science competitions, as many real-world companies host competitions there. Moreover, there are many competitions for aspirants to join, no matter their experience level.

Some competitions are limited in time, but many are always available to join. All the competitions were free, and many even had money as prizes. However, the competition there could be fierce, as many talented professionals join the competition. Nevertheless, it’s a good place if you want to start your data science competition experience.

DataHack by Analytic Vidhya

The next free competition you can join is DataHack. It’s a data science competition platform hosted by Analytics Vidhya, an online platform and community for data science. It offers many articles, tutorials, job platforms, and competitions.

DataHack is a data science competition platform that allows participants to solve real-world problems and compete for prizes. You don’t need to have experience in data science to join the competition, which is free. Moreover, many competitions are open to the public without prizes, as they’re designed for learning.

Overall, the platform is excellent for those who want to experience how it feels to compete with others in the world while still being able to interact with the community. By seeing how others approach the competition, you could learn a lot.

AI Hackhatons by MachineHack

MachineHack is an online platform for data science and machine learning enthusiasts. It mainly offers competitions and hackathons to improve users' skills and gain experience. The leaderboard is public, making it a great platform to make a name for yourself via competition.

The AI Hackathon is the place where MachineHack offers competitions. You can join various competitions without the need to pay anything while competing for the top spot. Some could provide prize money while many could be used to practice your skills.

The competition attracts many talented individuals, so you can try competing with them to improve your data science abilities. At the same time, you can build a portfolio of projects, and connect with other professionals in the field.

AI Crowd

AI Crowd is intended as a research platform, but it was created by offering a data science competition to advance research. The platform's principles are open science and reproducible research that could lead to creative solutions for real-world problems.

Like the previous platform, this one offers many competitions with prize money. However, there are not many variations in the competitions, as many of them were intended for research purposes. Nevertheless, the competitions hosted on this platform are mostly advanced enough to serve as learning experiences for the competitors.

DrivenData

DrivenData is similar to the AI Crowd as it is a data science competition platform based on real-world problems. It offers users the chance to compete to solve problems that have a real impact.

Examples of competitions are predicting disease spread or managing water supply, which makes them great for learning and making a real change. The platform is a great way to improve your data science skills and build real-world experience. You can even win some money along the way.

Conclusion

Competitions are a great way to improve your data science skills while networking with your peers. If you are excellent at the competition, you could win some prizes along the way. In this article, we have discussed 5 free competitions for aspiring data scientists:

  1. Kaggle Competitions
  2. DataHack by Analytic Vidhya
  3. AI Hackhatons by MachineHack
  4. AI Crowd
  5. DrivenData

I hope it helps.

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

More On This Topic

  • Harvard's Top Free Courses for Aspiring Data Scientists
  • Learn Machine Learning 4X Faster by Participating in Competitions
  • Are Kaggle Competitions Useful for Real World Problems?
  • High-Fidelity Synthetic Data for Data Engineers and Data Scientists Alike
  • What Is The Real Difference Between Data Engineers and Data Scientists?
  • We Don't Need Data Scientists, We Need Data Engineers

Resilience AI Tackles India’s Heatwave with AI-Powered Heat Risk Solutions

As India grapples with the escalating threat of extreme weather events fueled by climate change, Resilient AI, a Tech4Impact startup, is stepping up to combat the growing heatwave crisis through cutting-edge, AI-powered heat risk solutions.

Haripriya Kesavan, a climate researcher and youth advisor, represented the company and its climate partner STS Global at the prestigious National Institute of Disaster Management’s Young Leaders Conclave 2024 and the annual conference on economics and public policy, ACEP 2024, to demonstrate their innovative approach.

With 12 cities, including Bengaluru and Guwahati, experiencing more than 100 days of higher-than-normal temperatures since 2023, and 15 out of India’s 27 cities facing at least five days of excessive heat in a single year, the cost of inaction is enormous. Resilience AI is focusing on the often-overlooked danger of indoor heat, which can be even more hazardous than outdoor temperatures.

In Delhi’s Vivekananda Camp, a densely populated informal settlement, Resilience AI deployed its AI-based ResSolv heat risk model, utilizing a multi-pronged strategy that includes building footprint detection, multi-hazard risk scoring, and heat hotspot mapping. This approach allows for targeted interventions, empowering stakeholders with accurate information to help households most in need.

Using the AI-generated risk maps, Resilience AI identifies high-risk homes and prioritizes outreach for early warning and preparation. They engage with the local community to demonstrate the significance of heat risks, as many residents often underestimate the danger they face. This targeted community outreach is crucial in raising awareness and promoting proactive measures.

In addition to risk assessment and outreach, Resilience AI works with the community to implement long-term heat mitigation solutions, such as solar thermal roofs from scrap material, street lighting with locally produced materials, and revitalized Safe Drinking Water stations. By involving the community in the implementation process, Resilience AI ensures that the solutions are sustainable and well-received.

Women leaders in the Vivekananda Camp community have used AI climate risk tools and an automatic weather station to develop a tech-savvy Early Warning System that issues alerts at the household level, enabling residents to take precautionary action. This empowers the community to be proactive in the face of heat risks.

To build a more resilient future, Resilience AI advocates for policy interventions, such as building codes and permits that consider technology-based heat risk assessments, and mandatory physical risk assessments for financing climate change construction. The company also promotes the use of AI Climate-Powered Risk Assessment, an innovative tool that offers developers insight into heat risk in greenfield and brownfield projects.

“For the unique challenges we face in today’s times, our solutions must be equally unique,” said Samhita R, Co-Founder at Resilience AI. “By harnessing AI’s power and building resilience for our communities, we can effectively adapt to a rise in heat risk and reduce the costs of inaction.”

The post Resilience AI Tackles India’s Heatwave with AI-Powered Heat Risk Solutions appeared first on AIM.

Apple partners with OpenAI to bring ChatGPT to iOS, iPadOS, and MacOS

iPhone AI at WWDC 2024 at Apple Park

After ostensibly languishing in the artificial intelligence (AI) race for the last two years, Apple has found a way to upgrade its AI profile — in part by teaming up with OpenAI.

At Apple's Worldwide Developers Conference (WWDC) on Monday, the company announced new ChatGPT integrations across iOS 18, iPadOS, and MacOS Sequoia. Powered by GPT-4o, the features are framed as an offshoot of Apple Intelligence, the cleverly named umbrella that refers primarily to the company's in-house "personal intelligence" AI models.

Also: Everything Apple announced at WWDC 2024, including iOS 18, Siri, AI, and more

Users will now be able to access ChatGPT's image- and document-interpreting capabilities directly from their iPhone, iPad, or Mac without having to toggle between Apple and OpenAI tools.

The partnership links Siri with ChatGPT to extend on-device support. With a user's permission, Siri will be able to send ChatGPT a request for help. For example, if a user asks Siri for help with a task it believes ChatGPT could do better, the assistant will offer to forward the request to ChatGPT instead. Siri can also forward documents and photos to ChatGPT, and will then return ChatGPT's output directly to the user.

Also: Why I still recommend the iPhone 15, even to 'Pro' users

OpenAI's chatbot will also be available in Writing Tools, Apple's system-wide content generation assistant. Through Compose, users can even leverage ChatGPT's image generation feature — for example, to create a custom bedtime story complete with bespoke images.

The ChatGPT integrations will be free to access and will not require users to have an account. In keeping with Apple's commitment to user privacy, the company will obscure users' IP addresses and won't let OpenAI store user request data. Siri will also always ask for permission before connecting to ChatGPT.

However, ChatGPT's data-use policies will apply to ChatGPT Plus users if they connect their subscriptions for access to more advanced features.

In addition to splashy generative AI features like image generation, Apple is also focusing its efforts on features that improve users' daily lives. These features make the most of ChatGPT's GPT-4o capabilities for on-device convenience.

Last month, Microsoft programmed AI directly into its consumer products with the release of the Copilot+ PCs. With this release, Apple is following suit by bringing everyday AI to users' fingertips via systems they're accustomed to using already. By launching both Apple Intelligence and ChatGPT on-device, Apple is further normalizing generative AI in the mainstream — at least for iPhone users.

Also: ChatGPT privacy tips: Two important ways to limit the data you share with OpenAI

Apple's partnership with OpenAI has been rumored since at least last month. This news comes on the heels of the company reportedly also being in talks with Google Gemini about a potential agreement this spring.

ChatGPT will be available in iOS 18, iPadOS 18, and MacOS Sequoia later this year.

Artificial Intelligence

GitHub is Madly in Love with India’s Burgeoning Developer Ecosystem

GitHub believes India will overtake the US as the largest developer community on the platform by 2027. To foster this ecosystem and assist developers across India and beyond, it has partnered with Indian IT firm Infosys and opened the first GitHub Center of Excellence at Infosys, Bengaluru.

This partnership represents a generational opportunity for Global Systems Integrators (GSIs) to spearhead advancements in the AI and software sectors.

“A new day has begun for the world’s GSIs. The Age of Copilot is here,” said GitHub chief Thomas Dohmke, who is in Bengaluru to attend GitHub Constellation 2024 scheduled for June 12. He added that by equipping their developers with GitHub Copilot and extending its capabilities to customers, GSIs can dramatically accelerate software production worldwide.

Open Healthcare Network in India is a profoundly inspiring story of how we can accelerate human progress by enabling the world's soon-to-be largest developer community with the possibilities of AI. India's developers, building with their copilot companion, will help save lives –… pic.twitter.com/pF6swqNlxP

— Thomas Dohmke (@ashtom) June 11, 2024

GitHub Constellation celebrates the best of the Indian developer community and provides a platform to connect on topics such as AI, collaboration, community, and security.

“GitHub is a very internal partner in what we are doing at Infosys. It brings tremendous value, letting developers focus on code, creating new features and functionalities, and innovating at the speed of thought,” said Naresh Choudhary, vice president and head of reuse and tools at Infosys.

“The GitHub advanced security features that we have been using, whether it is code scanning, secret scanning, or Dependabot, have all played a tremendous role in how we make code and deliver it to our customers as secure by design, built-in from Day 1 and not as an afterthought,” he explained.

We see generative AI play a critical role in all parts of the software development lifecycle; GitHub Copilot plays a crucial role in that. We have been on this Copilot journey for some time. We were early adopters, with 7,000 employees leveraging GitHub Copilot in the work that we do,” said DR Balakrishnan, EVP, service offering head, ECS, AI, and automation, Infosys.

GitHub x India

GitHub Copilot now allows users to code in Hindi as well. At Microsoft Build 2024, CEO Satya Nadella announced that developers can now program in their native languages, including Hindi.

“Think about it — every person can now start programming, whether it’s in Hindi, Brazilian, or Portuguese, and bring back the joy of coding in their native language,” said Nadella, emphasising that this would be available in the Copilot Workspace.

On his recent visit to the Microsoft India office, Dohmke said, “Together, GitHub and Microsoft will generate a groundswell of developers in India, building and deploying in natural language. India will rise in the age of AI—and we’re here to enable it.”

India currently has 13.2 million developers using GitHub, compared to approximately 20 million in the US. India also ranks second globally in the number of GenAI projects hosted on GitHub, following the US. The country hosts just under 30 million repositories.

In India, Axis Bank, HCLtech, and LTI Mindtree are some of the other customers of GitHub Copilot, apart from Infosys. Axis Bank was the first in the country to adopt Copilot for Microsoft 365 at enterprise scale with 300 users and has seen over 30% productivity gains in daily work.

Indian IT giant HCLTech developed a Copilot for Microsoft 365 plugin for Microsoft Teams to help software developers and managers streamline bug resolution. Meanwhile, LTIMindtree, an IT and consulting services company, created a Copilot for Microsoft 365 plugin for Teams to optimise staff management.

Dohmke recently posted a picture on X from his first visit to Bengaluru in 2008. “I love this country,” he wrote, adding that India is at the nexus of a monumental economic opportunity. “It is set to become the world’s largest developer community at the exact point in time when the age of AI (artificial intelligence) is taking off,” he said.

Although this is my first visit as GitHub CEO in India, this is not my first time here. I love this country. This is me in 2008, in Bengaluru — I haven’t aged at all 😝
India is at the nexus of monumental economic opportunity, as it is set to become the world’s largest developer… pic.twitter.com/Afhe6taG9S

— Thomas Dohmke (@ashtom) June 5, 2024

He further said that the next great AI startup will be as likely to come from Mumbai or Bengaluru as SF or Seattle. Ironically, last year, GitHub fired 85% of its Indian employees. Of the 216 employees, 183 were asked to leave, including the entire engineering team building GitHub.

Meanwhile, Indian developer Mufeed VH recently built something similar to GitHub Copilot—an open-source AI software engineer named Devika. It can understand human instructions, break them down into tasks, conduct research, and autonomously write code to achieve set objectives.

However, GitHub has been bullish on India to empower the developers in the country. As Dohmke sums it up, “The odds are ever in India’s favour to rise and win the age of AI.”

The post GitHub is Madly in Love with India’s Burgeoning Developer Ecosystem appeared first on AIM.

Meet the Creators of India’s AWAAZ

Meet the Creators of India’s AWAAZ

Text-to-speech (TTS) models are comparatively easier to make in English than in other languages. To fill this gap, IIT Guwahati alumni Sudarshan Kamath and Akshat Mandloi started smallest.ai, and decided to create one for Hindi as well. They call it AWAAZ.

With state-of-the-art Mean Opinion Scores (MOS) in Hindi and Indian English, AWAAZ can fluently converse in over ten accents, reflecting the diverse linguistic landscape of India.

The inception of AWAAZ was driven by the founders’ recognition of a gap in the market for high-quality, affordable TTS models for Indian languages. “When we started building, we realised that the models required for a voice bot were not mature for Indian languages. Existing models for non-English languages were nowhere close to production,” explained Kamath in an exclusive interaction with AIM.

Citing OpenAI’s GPT-4o, which is a generalised model, Kamath said that the company aims to build specialised models that can be tailored for customer support, even for small business. It is also cheaper than other Indian language TTS models, such as Veed.io and Murf.ai.

Janta ki AWAAZ

AWAAZ stands out for its single-shot voice cloning capability, which can replicate a voice from a mere five-second audio clip. The model also boasts a low streaming latency of just 200 milliseconds.

To make this technology accessible, smallest.ai has set an introductory price of INR 999 for 500,000 characters, positioning AWAAZ as a cost-effective solution, claiming to be ten times cheaper than its competitors, such as ElevenLabs.

Kamath said that the language model is about 750 million parameters in size, leveraged using existing open source models.

Kamath attributes the affordability of AWAAZ to their focus on data quality and model efficiency. “Our model is much smaller than those of competitors like ElevenLabs. Despite this, we achieve high-quality speech because our data is highly refined,” he explained.

smallest.ai uses AWS for cloud services, although they remain flexible about potential future partnerships.

The Dataset of AWAAZ was the Critical Part

Kamath and Mandloi launched smallest.ai in October 2023. The initial goal was to create a voice bot for India capable of qualifying leads and handling customer support. This led to the development of SAPIEN, a voice bot for sales, marketing, and customer support.

However, the lack of robust TTS models for Indian languages led them to focus on core model development, resulting in the creation of AWAAZ. “The data quality for TTS models reduced drastically when we moved away from English to other languages. It is worse for South Asian languages,” said Kamath.

The Indic data problem has been highlighted several times by researchers when speaking with AIM, be it for text or voice models.

“We spent a lot of time perfecting the dataset, using over thousands of hours of audio from various people from different states in India. We focused on data quality to ensure a diverse representation, making our model suitable for production-level deployment,” Kamath said.

The team invested significant resources into this endeavour, with over six months dedicated purely to the development and iterations of data quality.

AWAAZ is currently limited to Hindi and Indian English, but Kamath emphasises the importance of understanding the quality of the output. “The most difficult part is the data. If you tried our model in Tamil, it might respond a little, but we don’t advertise that capability because it’s not up to our standards yet,” he said.

Way Forward

The company’s ambitious roadmap includes expanding the model’s capabilities. “Our next step is moving closer to GPT-4o-like abilities for Indian languages, where the model can generate answers with a voice, enhancing the interactive experience,” Kamath revealed.

Additionally, smallest.ai is exploring the development of voice-to-voice models, aiming to offer custom solutions for specific business needs such as lead qualification and customer support.

The founders are committed to AI’s understanding of multimodal data.

“We’ve been fascinated by AI’s potential to understand more than just text. Speech is one of those areas where AI can truly start to seem human, much like in the movie ‘Her’,” Kamath said, reflecting on the broader vision that drives their work.

The post Meet the Creators of India’s AWAAZ appeared first on AIM.