Niantic Inc. is using Meta Llama models to generate creative behaviour in real time as players interact with creatures in the company’s latest AR game, Peridot. The game uses Meta’s Llama 2 to enhance interactions with the lifelike virtual pets called Dots that exhibit smart, unpredictable behaviours akin to real animals.
By leveraging Llama 2, Dots can now react dynamically to their surroundings in real time, fostering deeper connections and a heightened sense of companionship for players. Players can have “conversations” with their Dots, and the creatures will respond with surprising and organic behaviours, such as expressing joy, curiosity, or mischief, bringing a sense of realism to the virtual pet experience.
“We are eager to see more models open-sourced to enable teams like ours to freely explore their capabilities without being caught in discussions around cost, privacy, and cloud dependencies,” said Niantic Global Product Marketing Lead Asim Ahmed.
Implementing Llama 2 presented challenges in creating prompts that strike the right balance between expressiveness, creativity, and response time. The team addressed this by defining an expected response format in JSON, instantly improving the quality of the AI’s responses.
Niantic plans to continue pushing the boundaries of generative AI in Peridot and explore new ways to elevate player interactions across devices.
“Peridot’s success with generative AI gives us a glimpse into what’s possible, and we plan to elevate the way players interact with Peridot across devices,” said Ahmed.
As they have done before with pioneering AR games like Pokémon GO, they are now innovating with virtual pets in their latest mobile AR adventure. Looking ahead, the company envisions a wide range of opportunities to leverage AI, such as Llama 2, to drive new areas of gameplay more procedurally. This opens up opportunities for the gaming industry to similarly leverage Meta Llama.
The post Niantic Uses Meta Llama Models To Generate Creative Behaviours in Latest Mobile Game Peridot appeared first on AIM.
Ola CEO Bhavish Aggarwal fulfilled his promise by moving all AI infrastructure workloads from Microsoft Azure to Krutrim AI Cloud.
“Will help others also exit and move to our own Indian stack. More than 2500 devs have signed up!! Will be working with everyone to get onto our cloud services over coming weeks,” he added.
Krutrim, which became India’s first AI unicorn, launched Krutrim AI Cloud earlier this month, offering AI computing infrastructure and access to both its foundational models and open-source models like Meta’s Llama 3 and Mistral, enabling developers to build and run LLMs cost-effectively. This move positioned Krutrim to compete with major cloud providers like Microsoft Azure, Google Cloud, and AWS.
However, many users pointed out that Ola’s workloads is primarily on AWS instead of Azure.
What a joke! All this big talk about moving away from @Microsoft @Azure within a week for this most bullshit-iest of reasons when Ola's workload is primarily on @awscloud . This thread will show document all the critical resources of @Olacabs that are hosted with @awscloud… https://t.co/GjAB7oXBzC pic.twitter.com/vlIUoDNonO
— mas.to / (@kingslyj) May 11, 2024
“Does he think people who deploy and run companies’ cloud or work in AI are as stupid as consumers who buy two wheelers?” wrote Kissan AI founder Pratik Desai.
Another user pointed out on X, “Why does it sound like a pre-planned marketing campaign for launching Krutrim cloud? You can’t have these things done overnight for sure. Good way to grab attention.”
But what made Aggarwal take this drastic move?
Recently, Microsoft’s LinkedIn took down his post, in which he called the networking platform’s usage of non-binary gender pronouns like they/them “pronoun illness” and hoped it would not reach India.
According to him, “pronoun illness” is being taught by “big city schools” and is increasingly appearing in CVs—which he clearly is not a fan of. He believes India “needs to know where to draw the line in following the West blindly!”
That is why he decided to move his workloads from the big tech’s cloud to its inhouse cloud. This is why he decided that India need to create its own tech ecosystem so that we do not get “governed by western Big Tech monopolies.”
The post Ola Has Nothing to Do with Microsoft Azure Anymore appeared first on AIM.
On Wednesday, Microsoft and G42 unveiled a major digital investment initiative in collaboration with Kenya’s Ministry of Information, Communications, and the Digital Economy. G42, in partnership with Microsoft and other stakeholders, will spearhead an initial investment of $1 billion to support various projects within this initiative.
G42, in collaboration with local partners, will design and build a cutting-edge data center campus in Olkaria, Kenya. This facility will be powered entirely by renewable geothermal energy and feature advanced water conservation technology. The data center will host Microsoft Azure services through a new East Africa Cloud Region, set to become operational within 24 months of the agreement signing.
East Africa Digital Expansion
The initiative also includes four additional pillars to be developed with local partners that includes development and research of AI models in local languages, establishment of an East Africa Innovation Lab and extensive AI digital skills training, investments in international and local connectivity, and collaboration with the Kenyan government to ensure safe and secure cloud services across East Africa.
This new cloud region will offer customers scalable, secure, high-speed cloud and AI services, accelerating cloud adoption and the digital transformation of businesses, customers, and partners across Kenya and East Africa.
Brad Smith, vice chair and president of Microsoft, said, “This represents the single largest and broadest digital investment in Kenya’s history and reflects our confidence in the country, the government, its people and the future of East Africa.”
G42 has also started training an open-source large language AI model in Swahili and English using its data infrastructure in the United States. To further advance research in Kenya, Microsoft and G42 will strengthen their collaboration with local universities
Today, in partnership with @G42ai, we are announcing a $1B investment in Kenya which represents the single largest and broadest private digital investment in the country’s history. It reflects our confidence in the country, the government, its people, and the future of East… pic.twitter.com/3e9dtwlMH4
— Brad Smith (@BradSmi) May 22, 2024
Microsoft’s Aggressive Global Expansion
In April, Microsoft invested $1.5 billion in UAE-based technology holding company G42, where the focus was to expand AI technologies and skilling initiatives not only in the UAE, but across the globe.
Microsoft also committed to investing in South East Asian countries in the last couple of months. The company announced $1.7 billion to advance Indonesia’s Cloud and AI infrastructure. The first data center in Thailand was opened up last month by Microsoft and committed to providing upskilling opportunities for over 100,000 people.
The post Microsoft enhances G42 partnership with $1 billion digital investments in Kenya appeared first on AIM.
Infosys co-founder Narayana Murthy recently said that Indians are good at applying ideas generated elsewhere for the betterment of the nation. He also added that it would take time for the country to invent new things.
“There are going to be APIs and people are going to use them. That’s the way things will get built. It’s not bad to be a wrapper, it’s just that you shouldn’t be a shallow wrapper. You have to think about the value you’re adding on top of the model,” said Google CEO Sundar Pichai.
India might have arrived late to the AI party, but the future of AI in India is not that bleak. With an increase in investments, more initiatives like AI4Bharat, and industry and academia partnerships to bolster research in the country, India can definitely up its AI game!
Here are 12 Indian startups that are leading the GenAI wave in India.
KOGO AI
Bengaluru-based deep tech startup KOGO AI has developed a platform that helps companies build AI agents that can converse in Indic languages. Using the platform, companies can build an AI agent from scratch within minutes.
Initially, these agents will be able to support conversations in Urdu, Hindi, and English, with plans to include another 73 languages, both Indian and global, soon.
For this, the Bengaluru-based startup has partnered with Bhashini, the Indian government’s initiative aimed at breaking language barriers in India, and Microsoft to make the agents multilingual.
Sarvam AI
Established in July 2023, Sarvam AI was co-founded by Vivek Raghavan and Pratyush Kumar to make generative AI accessible to everyone in India at scale.
“We think this is a foundational technology, and we don’t want India to become solely a prompt engineering nation,” said Raghavan in an exclusive interview with AIM.
The company has raised $41 million in its Series A funding round led by Lightspeed Ventures with participation from Peak XV Partners and Khosla Ventures.
Last year, Sarvam AI also open sourced OpenHathi, an Indic Hindi LLM built on top of Llama 2. On Hugging Face, the model has been downloaded more than 18,000 times last month.
It recently also open-sourced ‘Samvaad’, a curated dataset with 100,000 high-quality conversations in English, Hindi, and Hinglish, totalling over 700,000 turns.
Further, Sarvam AI is collaborating with Meta to develop vernacular LLMs and has partnered with Microsoft to create an Indic voice based LLM.
PAiGPT
PAiGPT, India’s first AI chatbot for UPSC aspirants, recently released its app for Android and iOS.
The app’s USP is its ability to fetch real-time information on various topics and current affairs, similar to Perplexity AI and Google Gemini. However, what sets it apart is its feature that provides trending topics and the option to create multiple-choice questions based on the available information.
PAIGPT, India’s first AI answering engine for students and researchers which is not a wrapper! The only one which has been able to successfully include seamless English to Hindi translation of images.https://t.co/CwJH6tqMd3 App- https://t.co/MKxM2BLn9K
— Deepanshu Singh (@deepanshuS27) May 15, 2024
Founded in September 2022 by Eshank Agarwal, Addya Rai, Siddharth Singh, and Deepanshu Singh, the app also allows aspirants to upload images of editorials from popular newspapers and then generate summaries.
Soket AI Labs
India now has a company building solutions to achieve AGI and beyond. Soket Labs is an AI research firm with a vision to further the advancement in AI towards ethical AGI.
Founded in 2019 by Abhishek Upperwal, the company is part of NVIDIA’s Inception Programme and AWS Activate for training compute access.
Soket AI Labs recently introduced Pragna-1B, India’s first open-source multilingual model designed to cater to the linguistic diversity of the country. Available in Hindi, Gujarati, Bangla, and English, the model comes with 1.25 billion parameters and a context length of 2048 tokens.
KissanAI
In a major step forward for AI in agriculture, agri-tech startup KissanAI recently launched Dhenu Vision LLMs for crop disease detection.
Last year, KissanAI also released Dhenu 1.0, an agricultural LLM tailored for Indian farmers. Recently, it released Dhenu Llama 3, fine-tuned on Llama3 8B.
Agri Vertical Dhenu1.0 model, fine tuned on Llama3 8B, available for anyone to tinker and provide feedback Feel free to host+share if you got a spare GPU Still using the 1.0 dataset, we will have instruct version with 5x larger data set in near futurehttps://t.co/tRxCZUH6pi pic.twitter.com/lK29hZOQq8
— Pratik Desai (@chheplo) April 19, 2024
The agriculture generative AI startup also teamed up with UNDP to develop the pioneering voice-based vernacular generative AI CoPilot for Climate Resilient Agriculture (CRA) practices. This initiative aims to deliver crucial advice to thousands of Indian farmers, especially smallholders who have been hit hard by climate change.
Subtl.ai
Subtl.ai, is addressing the challenges of generative AI in enterprise environments. It focuses on creating solutions that enable enterprises to handle sensitive data securely without exposing it to the internet.
Vishnu Ramesh, founder of Subtl.ai, calls it a ‘private Perplexity built on light models for enterprise’.
Subtl.ai has developed a proprietary product that leverages the Llama 3 8B model, allowing businesses like the State Bank of India to access and respond to inquiries quickly and securely, directly citing provided sources of information.
dot.agent
dot.agent is the world’s first AMS (AI/Agent Management System) that acts as a central hub that directs requests to the most suitable AI agent or model for the task. This “smart dispatcher” continuously learns from your data & adapts to your specific use case.
It allows Dot to outperform AI models like GPT-4 and Devin in real-world use cases, potentially reducing your AI costs by up to 60%! Dot for Code Generation is also purportedly 8x better than GPT-4.
Stition.ai
Stition.ai focuses on building security products for LLMs. Stition’s security product that can automatically find safety flaws without human intervention and patch vulnerabilities has been in public beta since December. A full release is expected soon.
Happy news y'all! We (@stitionai) got into @ycombinator YC S24! coming out of stealth soon, let us cook! @odinshell @veev3x pic.twitter.com/pFLCqei4KE
— mufeed vh (@mufeedvh) May 11, 2024
Mufeed VH, the founder of Stition.AI, recently released an open-source passion project called Devika. This Indian version of Devin can understand human instructions, break them down into tasks, conduct research, and autonomously write code to achieve set objectives.
CognitiveLab
Founded by Adithya S Kolavi, CognitiveLab recently released an Indic LLM leaderboard for the growing number of Indic language models entering the scene without a uniform evaluation framework.
The Indic LLM leaderboard offers support for seven Indic languages – Hindi, Kannada, Tamil, Telugu, Malayalam, Marathi, and Gujarati – providing a comprehensive assessment platform. Hosted on Hugging Face, it currently supports four Indic benchmarks, with plans for additional benchmarks in the future.
Click here to check it out.
MachineHack
MachineHack Generative AI, one of the few pure-play generative AI startups in India, has launched DataLyze, a generative AI data analysis tool, making data analytics accessible to everyone.
Launched in 2018, MachineHack is an all-in-one platform designed for data engineers, data scientists, machine learners, and developers at all levels. Users can enhance their skills, compare their expertise with peers, write articles, learn coding, apply for jobs, and build impressive portfolios.
TWO
TWO is a tech company that aims to redefine human-AI interactions through its proprietary multilingual and cost-efficient language models called SUTRA. These are ultrafast, multilingual, online generative AI models that can operate in 50+ languages with conversational, search, and visual capabilities.
SUTRA-Online are internet-connected and hallucination-free models that understand queries, browse the web and summarise information to provide current answers. It can answer queries like “Who won the game last night?” or “What’s the current stock price?” accurately.
Tensoic
Mumbai-based software development company Tensoic released a Kannada Llama aka Kan-LLaMA — a 7B Llama-2 model, LoRA PreTrained and FineTuned on Kannada tokens.
Just a few days after releasing Kan-Llama, the researchers also released a playground to test the model. Hooked to NVIDIA A100 GPUs, Tensoic released the playground in partnership with E2E Networks, one of the biggest providers of cloud GPUs in India.
The post 12 Indian GenAI Startups Building Insane Products You Should Know About appeared first on AIM.
TWO, a startup backed by Reliance Jio, recently launched a family of models called SUTRA. These cost-efficient, multilingual generative AI models excel in 50+ languages, offering speech, search, and visual processing capabilities.
The startup raised a $20M seed fund in February 2022 from Jio Platforms and South Korean internet conglomerate Naver. “Jio has been one of our key partners for a long time and has invested in us from the very beginning,” said Pranav Mistry, the founder of TWO, in an exclusive interaction with AIM.
He added that Reliance Jio Infocomm chairman Akash Ambani takes a keen interest in the growth of the startup. “I meet with them often. Jio’s vision is to bring the power of AI through its services. Being a Jio partner gives us access to this market,” he said.
Before founding TWO in 2021, Mistry served as Samsung Technology & Advanced Research Labs’ (STAR Labs) President and CEO.
In 2009, Mistry developed SixthSense, a wearable gestural interface that integrates digital information with the physical world, enabling users to interact with data using natural hand gestures. This technology was introduced during his TEDIndia talk in 2009 and has since garnered widespread attention.
TWO’s SUTRA Line of Products
As of now, TWO offers four models on the SUTRA playground: Sutra Light, Sutra Pro, Sutra Turbo, and Sutra Online. “Some of our partners in Korea and India have already started evaluating our models and conducting pilots in their own products,” said Mistry.
In terms of capabilities, Mistry said, “SUTRA models are 56 billion parameters,” adding that it is a very small model compared to larger models showcasing a trillion parameters like OpenAI’s GPT-4o.
“The power of small models is that they can run very efficiently and at a very low cost. In order to run this model, we require a single NVIDIA RTX A6000 GPU” added Mistry.
TWO is planning to launch ChatSUTRA this month, a platform where users can start using SUTRA’s multilingual models in 50+ languages for almost any task – to chat, question, learn, brainstorm, write, and more.
TWO also has an AI-powered social media app called Zappy, which is quite popular in South Korea. “One of our apps, Zappy, which uses millions of AI-to-user conversations, is powered by SUTRA. Right now, it’s available in Korea, and we are planning to bring Zappy to India very soon this summer,” said Mistry.
Another product from TWO is Geniya which can browse data from the internet using Google, rivalling Perplexity AI. Mistry said that Geniya is still in public beta and users can try it out, following the official launch expected sometime in June.
SUTRA’s Architecture
SUTRA is a model built from scratch, not fine-tuned or based on any other LLM. It combines the LLM with neural machine translation (NMT) to accurately handle idiomatic expressions and colloquial language. “Our specialised NMT models are significantly smaller in parameter size, requiring much less data for training”, Mistry said.
This ensures that SUTRA not only grasps the literal meaning of given inputs but also understands the cultural context, which is essential for effective communication.
Mistry also highlighted that they have a dataset advantage, as they have trained Sutra on the millions of user-to-AI conversations happening on Zappy.
“We can actually use the user to AI conversation data in order to improve the quality of SUTRA,” said Mistry, adding that they have Korean data from over 20 million conversations that SUTRA was originally trained on in Korea.
SUTRA’s Customers
SUTRA models are currently available as APIs as well. Mistry said that he thinks that the Asia Pacific market is a huge opportunity for non-English AI models.
“We have access to companies like Jio, as well as Naver and SK Telecom in Korea. We want to work with these telecom companies to bring the power of their cloud and edge networks to distribute the power of SUTRA,” said Mistry.
SUTRA is not Alone
The Indian AI startup ecosystem is currently booming. Sarvam AI launched the OpenHathi series last year and is currently working on Indic voice LLMs. Meanwhile, Tech Mahindra is working on ‘Project Indus’.
This month, the Hanooman model was jointly released by SML India and 3AI Holding, an Abu Dhabi-based investment firm. Bengaluru-based CoRover also introduced BharatGPT, earlier this year.
In the meantime, Ola Cabs chief Bhavish Aggarwal is building Krutrim AI. Additionally, the Nilekani Center at AI4Bharat in IIT Madras released Airavata, an open-source LLM for Indian languages.
“I am aware of Sarvam AI, Krutrim AI, as well as the work from Tech Mahindra and SML’s Hanooman,” said Mistry.
However, Mistry believes that it’s not so much about the competition. “It’s about more people working together towards the goal of bringing the power of AI to India and SUTRA wants to be a part of this journey,” he concluded.
The post This Akash Ambani-Backed Startup is Building Multilingual LLMs for India appeared first on AIM.
Graphics processing units (GPUs) may have become a coveted piece of hardware in the AI realm, yet their status as the most sought-after component may wane.
Unprecedented demand for GPUs has made NVIDIA a trillion-dollar company. However, even NVIDIA is starting to move away from what they originally created as a graphics chip, according to Keith Witek, chief operating officer at Tenstorrent.
“They’re even moving their architecture towards the heterogeneous compute, which looks a bit more like a tensor computer.
“So yes, I think it will trend in that direction. And even the guys in the graphics business of AI are realising the benefits of drifting their architecture in that direction,” Witek told AIM in an exclusive interaction.
He advocates for system-on-chip (SoC) architectures incorporating tensor units, graph units, and CPUs, asserting that heterogeneous computing utilising both CPUs and graph processors is the optimal approach for handling future workloads.
Recently, big tech companies like Microsoft and AWS, which are among NVIDIA’s biggest enterprise customers, have developed their own AI chips to reduce their dependency on NVIDIA’s GPUs and simultaneously reduce cost.
At the recently held Google I/O 2024, the tech giant announced Trillium TPUs, its six-generation silicon designed to handle AI workloads more efficiently.
Interestingly, chips designed by AWS, Microsoft and Google too have heterogeneous architectures. For example, Azure Maia AI Accelerator and Azure Cobalt CPU integrate different specialised compute engines and accelerators on the same chip.
Similarly, AWS Inferentia and Trainium also integrate different specialised compute engines and accelerators on the same chip.
However, these chips are meant primarily for their internal use. Tenstorrent, on the other hand, sells its chips to enterprise customers, putting it in direct competition with NVIDIA.
“The company’s goal was to build boxes and compute platforms for high-end applications, like data centres and high-performance compute,” Witek said.
How Tenstorrent’s AI chips are better than NVIDIA GPU
Given their capacity to construct hardware solutions and provide two distinct software stacks, each optimised for their platform’s unique capabilities, users benefit from the flexibility to craft software and models to suit their needs.
Leveraging their proficiency in chip design and intellectual property (IP), Tenstorrent regards this as a holistic system-level strategy.
One of Tenstorrent’s biggest advantages is that its components architecturally pack more compute density into the box without creating much power and heat.
“So we have a very power-efficient compute, where we can put 32 compute engines in a box, the same size as NVIDIA puts eight in a box. With our higher compute density and similar power envelope, we outperform NVIDIA by multiples in terms of performance, output per watt, and output per dollar. Additionally, our software is open source and accessible to the community,” Witek said.
Tenstorrent’s AI chips eliminate the need for expensive interconnectors
Tenstorrent’s AI chips are designed to minimise the need to access DRAM compared to GPUs, which constantly access DRAM. Due to this distinction, NVIDIA requires costly silicon interposers like HBM memory chips.
SK Hynix, which makes HBM memory chips for NVIDIA, announced that they have already been sold out for the year. Moreover, Samsung has reported increased revenue growth resulting from a high demand for HBM memory chips.
These days, most data centre AI chips come equipped with HBM memory; however, Tenstorrent believes they can operate effectively without relying on such chips.
“We achieve comparable or superior performance using more economical GDDR6 or GDDR7 memory and organic interposers. Consequently, our chips are more cost-effective without compromising performance. This is because our architecture incorporates local cache and routing within the chip, reducing the necessity to access DRAM for internal connections,” Witek said.
Furthermore, Witek highlights that NVIDIA relies on expensive interconnects like Mellanox and NVLink to link boxes, racks, and containers within a data centre. In contrast, Tenstorrent can accomplish the same connectivity using Ethernet, which is both affordable and high-performing.
“So, why are people using NV Link and Mellanox? Well, they’re using it because NVIDIA can make a tremendous amount of money if they’re forced to use it. But architecturally, it’s not required that people spend that kind of money in their interconnection scheme in the data centre and we’re bringing that to the light of day with the architecture that we’re putting into the market,” Witek pointed out.
CUDA is a monster but has become unwieldy
Despite being a hardware company with a monopolistic hold in the GPU space, NVIDIA’s real competitive moat has been CUDA.
CUDA, which is an acronym for Compute Unified Device Architecture, is a software layer that gives direct access to the GPU’s virtual instruction set and parallel computational elements for the execution of compute kernels.
For quite some time, Intel and AMD have been trying to challenge CUDA’s dominance with their own software stack.
“CUDA has bloated over the years into a pretty large monster that can do a lot of different things, but it’s become very unwieldy,” Witek said.
“All the math libraries… and everything is encrypted, in fact, and NVIDIA is moving their platform more and more proprietary every quarter. They’re not letting AMD and Intel look at that platform and copy it. Programmers are avoiding CUDA whenever possible and writing a lot of code in C++ and other languages,” he continued.
On the contrary, Tenstorrent has open-sourced its software stacks. They offer a platform named Metalium, similar to CUDA but less cumbersome and more user-friendly. On Metalium, users can write algorithms and programme models directly to the hardware, bypassing layers of abstraction.
However, proficiency in hardware architecture is essential, as errors may require manual correction by the user for proper functionality.
“It’s very much like CUDA, but we give you 100% open-source software and 100% open-source models. You can get on Hugging Face, download the model and run it; we make it very easy for you to do that,” Witek said.
Tenstorrent’s second software, called BUDA, represents the envisioned future utopia, according to Witek.
“Eventually, as compilers become more sophisticated and AI hardware stabilises, reaching a point where they can compile code with 90% efficiency, the need for hand-packing code in the AI domain diminishes.
“Although we haven’t reached that crossover point yet, many anticipate it within the next few years. Hence, we are continually enhancing BUDA’s efficiency to prepare for that eventual transition,” he concluded.
The post ‘AI Chips Will Eventually Replace GPUs and Even NVIDIA Knows This’ appeared first on AIM.
Following the announcements of Copilot+ enabled AI PCs at the Microsoft Build developer event on May 20, Microsoft released new developer tools, enhancements to Microsoft Azure AI and new enterprise options for Copilot. GitHub Copilot received a lengthy list of new capabilities enabled by first- and third-party services.
Meanwhile, reactions to the AI memory feature Recall include some backlash against its observation of all of the user’s activity. Recall, announced at Microsoft Build on May 20, makes any activity on a Microsoft AI PC searchable, allowing the user to ask natural language questions and receive answers from across all of their activity on the device.
Team Copilot, AI agents and Copilot Studio open up business opportunities
Microsoft, on May 21, offered three new ways to work with its AI Copilot assistant: Team Copilot, Copilot agents and Copilot Studio in Microsoft Power Platform.
“Driving the use cases for AI-accelerated PCs to the enterprise makes perfect sense for Microsoft,” said Olivier Blanchard, research director at The Futurum Group. “It’s a massive opportunity for Microsoft, PC OEMs, and silicon vendors, from Qualcomm to Intel and AMD as a full ecosystem play.”
“By taking the lead from the OS, support, software, and developer ecosystem angles, Microsoft positions itself as the primary orchestrator and enabler of AI PC adoption in the enterprise, which takes a lot of pressure off PC OEMs and silicon vendors,” said Blanchard.
Team Copilot
Team Copilot adds more initiative to the AI assistant, letting it work, as Microsoft said, as a sort of teammate or facilitator. Team Copilot can take notes, manage meeting agendas, flag important information or unresolved issues and manage projects. Customers with a Microsoft Copilot for Microsoft 365 license, which costs $20 per month, will be able to try Team Copilot in preview later in 2024.
Team Copilot can ‘sit alongside’ people in meetings, taking notes and offering suggestions. Image: Microsoft
Copilot agents
With copilot agents (which Microsoft writes in lowercase when referring to this specific offering), you can prompt AI to take on roles custom to your business needs. For example, a copilot agent might handle orders, automate processes, add context to processes and meetings or learn based on user feedback. Agents are available to customers in Microsoft’s Copilot Studio Early Access Program, with wider availability expected later this year.
Copilot Studio in Microsoft Power Platform
To create copilot agents, you can use Copilot Studio, the prompt-based, no-code platform, to design and test what actions the copilot can take. Developers can request their copilot perform specific tasks, then provide the AI with the text or other resources it might need to have context about that specific business process. Microsoft Copilot Studio is free with a Microsoft Copilot for Microsoft 365 license in limited private preview.
Copilot stack and Snapdragon Dev Kit for Windows open up Copilot+ PCs
For developers getting ready to work on the new Copilot+ PCs, Microsoft introduced the Copilot stack on Windows, additions to the OS that take advantage of and facilitate development of further AI models and capabilities. The Copilot stack includes:
Windows Copilot Runtime, with its library of APIs powered by on-device AI.
Windows Semantic Index, the OS capability behind Recall. Windows Semantic Index will in the future connect with the Vector Embeddings API to help developers create vector stores and run retrieval-augmented generation within their applications.
Phi Silica, a small language model custom made for the neural processing units in Copilot+ PCs.
Native support for PyTorch on Windows with DirectML.
Web Neural Network Developer Preview for Windows on DirectML.
New AI-based productivity features in Dev Home.
Qualcomm announced on May 21 the Snapdragon Dev Kit for Windows, which allows developers to access the NPU in Copilot+ PCs. The dev kit includes:
3.8 GHz 12 Core Oryon CPU with dual core boost up to 4.3GHz.
32 GB LPDDR5x memory.
512GB M2 storage.
80 Watt system architecture.
Support for up to three concurrent external displays.
AI comes to Microsoft Fabric
Microsoft Fabric now includes Real-Time Intelligence facilitated by AI, which can analyze and uncover data. The Real-time hub lets users see their organization’s real-time data and set up automatic alerts. Real-Time Intelligence is in general availability and public preview now.
More updates to Microsoft Fabric are detailed on the Microsoft site.
GitHub Copilot can talk to Docker and other services
GitHub Copilot, the generative AI coding assistant, has received upgrades to allow it to talk to first- and third-party partner services, which are bundled as Copilot Expansions. Of particular relevance to Microsoft Build is GitHub Copilot for Azure, which lets GitHub users deploy to Azure using prompts.
Copilot Expansions are only available to invited users for now. The complete list of first- and third-party services GitHub will be able to pull data from is split by availability date.
Invited users can currently access Copilot Extensions that let them pull data from the following in GitHub Marketplace into GitHub Copilot:
DataStax.
Docker.
Lambda Test.
LaunchDarkly.
McKinsey & Company.
Octopus Deploy.
Pangea.
Pinecone.
Product Science.
ReadMe.
Sentry.
Teams Toolkit.
The following extensions will be available to all users “in the coming weeks” through Visual Studio Marketplace for VSCode, according to GitHub:
Microsoft including Teams Toolkit and Microsoft 365.
Stripe.
MongoDB.
Azure AI Studio is now available, with added guardrails against hallucinations and cyberattacks
At Build 2024, Microsoft announced general availability of Azure AI Studio, the pro-code platform for generative AI development. Azure AI Studio can be found at ai.azure.com for free. An Azure account is required to build a copilot.
Microsoft expanded Azure AI Studio’s suite of Responsible AI tools and safety features, including filters for specific content, prompt shields to fight against prompt injection attacks and extra precautions against hallucinations.
Some of Azure’s new features will take advantage of NVIDIA’s Blackwell AI accelerator.
The UK casts doubts on Recall’s privacy
After Microsoft announced the Recall AI search function on Copilot+ laptops on May 20, some praised its novel approach to searching across an entire PC. However, the “snapshots” the AI takes of users’ activity have led the U.K.’s Information Commissioner’s Office to reach out to the Redmond giant for more information.
“We are making enquiries with Microsoft to understand the safeguards in place to protect user privacy,” the ICO said.
ICO believes companies must “rigorously assess and mitigate risks to peoples’ rights and freedoms,” the ICO told the BBC.
On May 20, Tesla CEO and former OpenAI backer Elon Musk compared Recall to the horror television show Black Mirror.
Microsoft said Recall snapshots are stored only on the local PC — a feature enabled by the on-device AI accelerator. Users can pause Recall, set Recall to ignore individual websites and apps or delete individual snapshots and ranges of time.
Blanchard said the ability to delete records or selectively choose what shows up in Recall means “users are in control of the experience and of their own privacy. Also, unlike Search Engines, that data will be stored on users’ PCs, adding an additional layer of privacy and data security to the feature,” he said.
However, Blanchard said, “What Microsoft will have to figure out though is what happens to that data when PC users replace or upgrade their Copilot+ PCs in a few years, but there’s time to figure that out.”
OpenAI has released its latest and most advanced language model yet – GPT-4o, also known as the “Omni” model. This revolutionary AI system represents a giant leap forward, with capabilities that blur the line between human and artificial intelligence.
At the heart of GPT-4o lies its native multimodal nature, allowing it to seamlessly process and generate content across text, audio, images, and video. This integration of multiple modalities into a single model is a first of its kind, promising to reshape how we interact with AI assistants.
But GPT-4o is much more than just a multimodal system. It boasts a staggering performance improvement over its predecessor, GPT-4, and leaves competing models like Gemini 1.5 Pro, Claude 3, and Llama 3-70B in the dust. Let's dive deeper into what makes this AI model truly groundbreaking.
Unparalleled Performance and Efficiency
One of the most impressive aspects of GPT-4o is its unprecedented performance capabilities. According to OpenAI's evaluations, the model has a remarkable 60 Elo point lead over the previous top performer, GPT-4 Turbo. This significant advantage places GPT-4o in a league of its own, outshining even the most advanced AI models currently available.
But raw performance isn't the only area where GPT-4o shines. The model also boasts impressive efficiency, operating at twice the speed of GPT-4 Turbo while costing only half as much to run. This combination of superior performance and cost-effectiveness makes GPT-4o an extremely attractive proposition for developers and businesses looking to integrate cutting-edge AI capabilities into their applications.
Multimodal Capabilities: Blending Text, Audio, and Vision
Perhaps the most groundbreaking aspect of GPT-4o is its native multimodal nature, which allows it to seamlessly process and generate content across multiple modalities, including text, audio, and vision. This integration of multiple modalities into a single model is a first of its kind, and it promises to revolutionize how we interact with AI assistants.
With GPT-4o, users can engage in natural, real-time conversations using speech, with the model instantly recognizing and responding to audio inputs. But the capabilities don't stop there – GPT-4o can also interpret and generate visual content, opening up a world of possibilities for applications ranging from image analysis and generation to video understanding and creation.
One of the most impressive demonstrations of GPT-4o's multimodal capabilities is its ability to analyze a scene or image in real-time, accurately describing and interpreting the visual elements it perceives. This feature has profound implications for applications such as assistive technologies for the visually impaired, as well as in fields like security, surveillance, and automation.
But GPT-4o's multimodal capabilities extend beyond just understanding and generating content across different modalities. The model can also seamlessly blend these modalities, creating truly immersive and engaging experiences. For example, during OpenAI's live demo, GPT-4o was able to generate a song based on input conditions, blending its understanding of language, music theory, and audio generation into a cohesive and impressive output.
Using GPT0 using Python
import openai # Replace with your actual API key OPENAI_API_KEY = "your_openai_api_key_here" # Function to extract the response content def get_response_content(response_dict, exclude_tokens=None): if exclude_tokens is None: exclude_tokens = [] if response_dict and response_dict.get("choices") and len(response_dict["choices"]) > 0: content = response_dict["choices"][0]["message"]["content"].strip() if content: for token in exclude_tokens: content = content.replace(token, '') return content raise ValueError(f"Unable to resolve response: {response_dict}") # Asynchronous function to send a request to the OpenAI chat API async def send_openai_chat_request(prompt, model_name, temperature=0.0): openai.api_key = OPENAI_API_KEY message = {"role": "user", "content": prompt} response = await openai.ChatCompletion.acreate( model=model_name, messages=[message], temperature=temperature, ) return get_response_content(response) # Example usage async def main(): prompt = "Hello!" model_name = "gpt-4o-2024-05-13" response = await send_openai_chat_request(prompt, model_name) print(response) if __name__ == "__main__": import asyncio asyncio.run(main())
I have:
Imported the openai module directly instead of using a custom class.
Renamed the openai_chat_resolve function to get_response_content and made some minor changes to its implementation.
Replaced the AsyncOpenAI class with the openai.ChatCompletion.acreate function, which is the official asynchronous method provided by the OpenAI Python library.
Added an example main function that demonstrates how to use the send_openai_chat_request function.
Please note that you need to replace “your_openai_api_key_here” with your actual OpenAI API key for the code to work correctly.
Emotional Intelligence and Natural Interaction
Another groundbreaking aspect of GPT-4o is its ability to interpret and generate emotional responses, a capability that has long eluded AI systems. During the live demo, OpenAI engineers showcased how GPT-4o could accurately detect and respond to the emotional state of the user, adjusting its tone and responses accordingly.
In one particularly striking example, an engineer pretended to hyperventilate, and GPT-4o immediately recognized the signs of distress in their voice and breathing patterns. The model then calmly guided the engineer through a series of breathing exercises, modulating its tone to a soothing and reassuring manner until the simulated distress had subsided.
This ability to interpret and respond to emotional cues is a significant step towards truly natural and human-like interactions with AI systems. By understanding the emotional context of a conversation, GPT-4o can tailor its responses in a way that feels more natural and empathetic, ultimately leading to a more engaging and satisfying user experience.
Accessibility
OpenAI has made the decision to offer GPT-4o's capabilities to all users, free of charge. This pricing model sets a new standard, where competitors typically charge substantial subscription fees for access to their models.
While OpenAI will still offer a paid “ChatGPT Plus” tier with benefits such as higher usage limits and priority access, the core capabilities of GPT-4o will be available to everyone at no cost.
Real-World Applications and Future Developments
The implications of GPT-4o's capabilities are vast and far-reaching, with potential applications spanning numerous industries and domains. In the realm of customer service and support, for instance, GPT-4o could revolutionize how businesses interact with their customers, providing natural, real-time assistance across multiple modalities, including voice, text, and visual aids.
In the field of education, GPT-4o could be leveraged to create immersive and personalized learning experiences, with the model adapting its teaching style and content delivery to suit each individual student's needs and preferences. Imagine a virtual tutor that can not only explain complex concepts through natural language but also generate visual aids and interactive simulations on the fly.
The entertainment industry is another area where GPT-4o's multimodal capabilities could shine. From generating dynamic and engaging narratives for video games and movies to composing original music and soundtracks, the possibilities are endless.
Looking ahead, OpenAI has ambitious plans to continue expanding the capabilities of its models, with a focus on enhancing reasoning abilities and further integrating personalized data. One tantalizing prospect is the integration of GPT-4o with large language models trained on specific domains, such as medical or legal knowledge bases. This could pave the way for highly specialized AI assistants capable of providing expert-level advice and support in their respective fields.
Another exciting avenue for future development is the integration of GPT-4o with other AI models and systems, enabling seamless collaboration and knowledge sharing across different domains and modalities. Imagine a scenario where GPT-4o could leverage the capabilities of cutting-edge computer vision models to analyze and interpret complex visual data, or collaborate with robotic systems to provide real-time guidance and support in physical tasks.
Ethical Considerations and Responsible AI
As with any powerful technology, the development and deployment of GPT-4o and similar AI models raise important ethical considerations. OpenAI has been vocal about its commitment to responsible AI development, implementing various safeguards and measures to mitigate potential risks and misuse.
One key concern is the potential for AI models like GPT-4o to perpetuate or amplify existing biases and harmful stereotypes present in the training data. To address this, OpenAI has implemented rigorous debiasing techniques and filters to minimize the propagation of such biases in the model's outputs.
Another critical issue is the potential misuse of GPT-4o's capabilities for malicious purposes, such as generating deepfakes, spreading misinformation, or engaging in other forms of digital manipulation. OpenAI has implemented robust content filtering and moderation systems to detect and prevent the misuse of its models for harmful or illegal activities.
Furthermore, the company has emphasized the importance of transparency and accountability in AI development, regularly publishing research papers and technical details about its models and methodologies. This commitment to openness and scrutiny from the broader scientific community is crucial in fostering trust and ensuring the responsible development and deployment of AI technologies like GPT-4o.
Conclusion
OpenAI's GPT-4o represents a true paradigm shift in the field of artificial intelligence, ushering in a new era of multimodal, emotionally intelligent, and natural human-machine interaction. With its unparalleled performance, seamless integration of text, audio, and vision, and disruptive pricing model, GPT-4o promises to democratize access to cutting-edge AI capabilities and transform how we interact with technology on a fundamental level.
While the implications and potential applications of this groundbreaking model are vast and exciting, it is crucial that its development and deployment are guided by a firm commitment to ethical principles and responsible AI practices.
When OpenAI CEO Sam Altman posted ‘her’ on X during the GPT-4o launch a few weeks ago, little did the world know that it would be cursed with Hollywood star Scarlett Johansson claiming to be one of the voices (Sky) in the demos.
Poor Altman had no idea that his post would snowball into a controversy of this scale. In a statement on Monday, Johansson chastised the business and its CEO for using her voice without permission.
Calling the voice “eerily similar”, the actress clarified that Altman had approached her in September and offered to engage her to speak in a ChatGPT voice – an offer she had declined.
“He told me that he felt that by voicing the system, I could bridge the gap between tech companies and creatives and help consumers feel comfortable with the seismic shift concerning humans and AI. He felt my voice would be comforting to people,” the Jojo Rabbit star stated.
After nine months, she said, everyone, including friends, family, and the general public, noticed how much the newest system, Sky, sounded like her.
Following the uproar, OpenAI declared that it would stop using the voice, but did not give an explicit reason for the same in its statement. In a blog post, the company said the voice match was just a coincidence.
According to OpenAI, Sky’s voice belongs to a “different professional actress”. AI voices “should not deliberately mimic a celebrity’s distinctive voice”, it added. However, owing to privacy concerns, the company was unable to disclose the identity of the voice expert.
Not Scarlett Johansson?
The response from Open AI, including a blog detailing how the voices were developed, seems to directly answer Johansson’s questions. OpenAI began providing voice capabilities to its users for conversations with ChatGPT in September 2023. According to them, the five voices – Breeze, Cove, Ember, Juniper, and Sky – are samples from voice actors with whom the business collaborated during creation.
Several characteristics were considered when selecting the vocals, such as having a “timeless” quality and “an approachable voice that inspires trust”. Over five months last year, OpenAI evaluated hundreds of voice submissions. Later, the selected performers flew to San Francisco for recording sessions, during which OpenAI used their voices to train its models.
‘Too Sissy’
Although most people haven’t had a chance to use these recently revealed features, the capabilities have prompted even more comparisons to Spike Jonze’s dystopian romance Her. The movie centres around an introverted man (Joaquin Phoenix) who develops feelings for an AI-operating system (Johansson), leading to several complications.
The model’s demos last week sparked reactions from many who noticed that certain exchanges had an oddly amorous tone. Coachvox.ai founder Nick Pierre commented, “Am I the only one that gets the ick from how flirty this is?”
Many joked that they had a new “girlfriend” or were being captivated by the AI voice. A few users described Sky as “flirty” and “provocative” in a flurry of posts on X.
For instance, in a video shared by OpenAI, a female-voiced ChatGPT thanks an employee for “rocking an OpenAI hoodie”. In another, the chatbot responds to praise by saying, “Oh stop it, you’re making me blush.”
This spurred discussion on the gendered approaches critics claim tech companies have long employed to create and interact with voice assistants. This practice predates the most recent wave of generative AI, which enhanced the capabilities of AI chatbots.
Johansson’s comment also rakes up the whole issue of deepfakes and the preservation of personal identity, likeness, and labour. The continuing discussion concerning the moral implications of AI technology is beginning to take a significant turn now.
The post Scarlett Johansson Thinks She Is Sam Altman’s ‘Her’ appeared first on AIM.
At the ongoing VivaTech, the annual technology conference for startups, in Paris, Meta AI chief Yann LeCun, advised students looking to work in the AI ecosystem, to not work on LLMs.
“If you are a student interested in building the next generation of AI systems, don’t work on LLMs. This is in the hands of large companies, there’s nothing you can bring to the table,” said LeCun at the conference.
He also said that people should develop next-generation AI systems that overcome the limitations of large language models.
Moving Away from LLMs
Interestingly, the discussion on alternatives of LLM-based models has been ongoing for a while now. Recently, Mufeed VH, the young creator of Devika, a Devin alternative, spoke about how people should move away from Transformer models and start building new architectures.
“Everyone’s doing the same thing, but if we focus on different architectures, like RMKV [a RNN architecture], it would be really good,” said Mufeed who goes on to explain the unlimited context window and inference for that particular architecture.
He also believes that with this approach, it is even possible to build something nearly as impressive as GPT-4.
Moving away from LLMs is something LeCun has been prominently advocating and believes in taking away the control from the hands of a few. Another reason why he pushes for open-source too.
“Eventually all our interactions with the digital world will be mediated by AI assistants,” he said and urged for platforms to not allow a small number of AI assistants to control the entire digital world.
“This will be extremely dangerous for diversity of thought, for democracy, for just about everything”, he said.
But, LLMs are Only Advancing..
While LeCun might be against LLMs, the transformer model training models are evolving. Dan Hou, an AI/ML advisor, spoke about GPT-4o and emphasised on its training model.
When text was believed to be the basis for all sophisticated models, GPT-4o was designed to understand video and audio natively. This impacts the volume of data that future versions can be trained on.
“How much smarter can AI get? With a natively multi-modal architecture, I suspect the answer is much, much better,” said Hou.
Furthermore, Sam Altman, in a recent interview also spoke about how data wouldn’t be a problem anymore, thereby addressing the concerns of training LLMs.