Meta Forces Developers Cite ‘Llama 3’ in their AI Development

Meta Llama 3

Meta has finally dropped Llama 3, with a twist!

In its Community License Agreement conditions, the company has mentioned, “If you use the Llama Materials to create, train, fine-tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model name”, under the redistribution and use section.

Source: X

As expected, this strange clause has sparked conversations among the AI community, with many debating its need, justification, and impact.

With many startups riding on the open-source AI wave, especially in India, making it an Open-Source AI champion, it would be interesting to note how this development gets implemented.

Would we see a name change for Kissan AI’s agri LLM Dhenu, Sarvam AI’s OpenHathi, PAiGPT, and other AI models in Indic languages like Ambari, Odia Llama, Tamil Llama, all based on the Llama 2 model?

Some are just having fun online over this latest development.

Introducing Llama 3

Meta has released Llama 3, the latest generation of its LLM. The model, available in 8B and 70B parameter versions, has been trained on over 15 trillion tokens, making it seven times larger than Llama 2’s dataset.

Llama 3 offers SOTA performance and enhanced reasoning and coding capabilities. Its training process is three times more efficient than its predecessor.

The 7B models outperforms Gemma and Mistral on all benchmarks and the 70B model outperforms Gemini Pro 1.5 and Claude 3 Sonnet. Meta is also training a model with more than 400 billion parameters.

Given its advancements, Meta’s push to include ‘Llama 3’ in the beginning of AI models that use it in their development, is a boost towards security and responsible use cases.

The post Meta Forces Developers Cite ‘Llama 3’ in their AI Development appeared first on Analytics India Magazine.

Generative AI Jobs in India can Fetch You up to Rs 1 Crore 

The demand for generative AI jobs in India is definitely on the rise. A recent report revealed that senior developers working in generative AI draw over INR 1 crore per annum, while an entrant’s salary could easily be around INR 18 lakh per annum.

Further, it stated that the techies in India are driving up salaries with additional AI, generative AI, and adjacent skills, commanding as high as a 30-50 percent premium over those without such expertise.

AIM Research noted that the median salaries of generative AI developers and engineers ranged between INR 11.1 lakh and 12.5 lakh per annum.

Upon further investigation, AIM found that entry-level AI engineers at companies like Accenture typically earned about INR 8.5 lakh per annum, compared to the INR 5-6 lakh per annum that regular software engineers took home. Additionally, AI engineers with generative AI skills saw a significant 50% increase in their salary.

Notably, Accenture recently secured $1.1 billion worth of generative AI deals.

AIM also noted that Accenture’s consulting roles in generative AI fetched significantly higher salaries than software engineering or tech roles. Experienced folks could easily make up to INR 35 lakh per annum with a base salary of INR 21 lakh per annum.

“GenAI has definitely created a new market, and companies and VCs are moving fast to build and capitalise on this. So, people in this field are getting paid above-average salaries,” said Mankaran Singh, founder of Flowdrive.

Further, he said that any engineer with expertise in deep learning can get started and start contributing to generative AI development within 2-3 months of studying.

An interesting story emerged when AIM spoke to Izam Mohammed, who had never attended college and is now an AI/ML engineer earning lakhs. He created the Python (ragrank.readthedocs.io) library for evaluating RAG models, using the credibility and skills he gained through online resources to land this job at 18.

Moreover, as global capability centres (GCCs) establish their presence in India, young talent in the country can seek employment opportunities to leverage their skills in generative AI for lucrative careers. According to consulting firm ANSR, about 90 percent of GCCs in India plan to harness the potential of AI, ML, and cognitive computing in the next 2-3 years.

A report indicates that mid-career software professionals in the GCC segment, with about three to eight years of experience, typically earned salaries ranging from INR 15 lakh to INR 35 lakh per annum.

Recently, AMD announced that it will expand its GenAI team in India and will be hiring machine learning research scientists/engineers at multiple levels.

Please have a look at the job postings below:
1. https://t.co/TxNZM76nDi
2. https://t.co/Mgf2IACyVe
[These roles are based in India]#ai #machinelearning #generativeai #AMD #hiring #research #phd

— pkms🍕 (@PrakamyaMishra) April 10, 2024

What about AI Startups?

AIM contacted computer scientist Pratik Desai, the founder of KissanAI, to understand the impact generative AI can have on salary packages.

Desai said, those who can conduct research using models, fine-tuning, and tokenisers and have a deep understanding of neural networks typically move to the US as they can get ‘high salaries, not just ‘30-50%’.

Further, he said that traditional software development roles, which involve knowledge of libraries like Llama Index or LangChain and assist in AI project development, would eventually see an increase in salaries. “They will see an increase, maybe not immediately. The immediate increase would be with established tech startups,” he added.

CoRover AI’s Ankush Sabharwal told AIM that the average salary for GenAI tech developers in India can range from INR 8 lakh to INR 24 lakh per annum, depending on experience and skill set. He puts them into two categories: GenAI Tech Developers and Business Problem Solvers.

GenAI Tech Developers are tech wizards who specialise in machine learning, deep learning, coding languages, and generative models. Business Problem Solvers are equipped with strong problem-solving abilities, business acumen, and a deep understanding of data.

Sabharwal said that the latter act as detectives, identifying problems, grasping industry nuances, and maximising data’s potential to leverage GenAI effectively. Both the roles offer substantial earning potential, attracting skilled professionals to the dynamic and impactful realm of GenAI employment opportunities, he added.

“The average salary of an ML engineer is INR 30-40 lakh per annum,” said a GenAI engineer at Sarvam AI, adding that if you are a good ML engineer, your salary will be above the same SWE-level salary anyway, GenAI or not.

Indian IT Pays Poorly in GenAI

Indian IT is betting big on generative AI. TCS recently announced having trained over 350,000 employees in AI skills and plans to hire 40,000 freshers for FY25.

Although TCS claimed to be training its employees in generative AI, the exact focus and content of this training remained ambiguous. Meanwhile, the entry-level salary for freshers at TCS, ranging from INR 3 lakh to 4 lakh per annum, remained lower than industry standards.

AIM contacted an employee who works with generative AI at TCS. He said that the pay scale specifically for GenAI resources at the company won’t change, and the hike is also based on the company’s financial performance and business unit budget allocation.

“I believe we may get some good hikes in the range of 25%-30% after switching to another company,” he added. Furthermore, he said that the necessary skills required to be a GenAI engineer include the basics of AI algorithms in machine learning, deep learning, neural networks, and NLP.

When asked about TCS’s training focus in generative AI, he explained, “TCS has been releasing various courses on GenAI on our internal platform to upskill its workforce so that resources are ready to be deployed in client projects.”

He added, “Along with the courses, TCS is providing free vouchers for Azure certification exams, access to Google Cloud, and Nvidia portal access for attending workshops, etc.”

Due to lower salaries, many employees choose to work for startups, GCCs, and major tech companies such as Microsoft, Google, and Meta. These companies provide higher compensation packages that can range from INR 15 lakh to over INR 40 lakh annually, excluding bonuses and stock options, and also could go up to INR 1 crore for senior executives and leaders.

The post Generative AI Jobs in India can Fetch You up to Rs 1 Crore appeared first on Analytics India Magazine.

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

IT services and consulting giant Happiest Minds Technologies is set to acquire Macmillan Learning India, which will become a wholly-owned subsidiary of the company. This transaction, which is set to conclude by the end of this month, involves a cash payment of Rs. 4.5 crores for 1,00,000 equity shares at Rs. 1 each.

Macmillan Learning India, established in September 2015, is an information technology entity focusing on software development services specifically for the Macmillan Group, a global leader in learning and education.

As of now, the company boasts a paid-up capital of Rs. 1 lakh, with a current turnover and net worth of Rs. 9.25 crores and Rs. 4.45 crores, respectively. The acquisition is not classified as a related party transaction, and no promoters or group companies are interested in Macmillan Learning India.

This strategic acquisition aligns with Happiest Minds’ enhancement of its Edutech vertical, reinforcing its partnership with the Macmillan group. Over the past three years, Macmillan Learning India’s turnover has shown consistent growth, reporting Rs. 6.9 crores in FY 2022-23, Rs. 5.5 crores in FY 2021-22, and Rs. 5.3 crores in FY 2020-21.

The company recently launched ‘hAPPI’, a generative AI-powered chatbot developed by its generative AI business services (GBS) unit for Happiest Health. The chatbot will engage with users in health and wellness knowledge conversations.

Read more: Data Science Hiring Process at Happiest Minds Tech

The post Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach appeared first on Analytics India Magazine.

‘AI Platforms will Control What Everybody Sees,’ Says Meta’s AI Chief Yann LeCun

Open Source AI Platforms

“Eventually all our interactions with the digital world will be mediated by AI assistants. This means that AI assistants will constitute a repository of all human knowledge and culture; they will constitute a shared infrastructure like the internet is today,” said Yann LeCun, one of the three godfathers of AI, in his talk at GenAI Winter School recently.

He urged platforms to be open-source and said that we cannot have a small number of AI assistants controlling the entire digital diet of every citizen across the world, taking a dig at OpenAI and a few other companies without naming them.

“This will be extremely dangerous for diversity of thought, for democracy, for just about everything”, he added.

There have been examples galore of things going wrong and biases taking the center stage when only a few companies have the power and control to manufacture the ‘cultural understanding’ for the entire world. They either tend to ignore different cultures or end up overcompensating in ticking off the ‘diversity’ check box.

Case in point: Google’s extra-‘woke’ chatbot Gemini that tried to forcefully inject diversity into pictures with a disregard for historical context. “It’s DEI gone mad,” exclaimed the notably agitated users.

Source: X

We Need Open-Source Base Models

“So what we need is not one AI assistant, we need base models like Llama 2, Mistral, and Gemma that can be fine-tuned by anybody so that, for example, it speaks Arabic and understands the culture of Morocco and knows everything about Marrakech,” said LeCun.

He emphasised that those platforms must be open because we need a high diversity of AI assistants the same way we need a high diversity of the press so that we have no echo chambers and have multiple sources of information.

Currently, we are seeing a multitude of AI models flourish. From farming and healthcare, to education and entertainment, AI is conquering every field. And it doesn’t stop at chat-based solutions. Now, with advancements like voice-first in empathetic voice interface models like Hume AI, our interactions with these assistants are only getting better.

Soon, as LeCun said, this will give birth to a time when “we’re not going to be using search engines. Instead, when it comes to interacting with digital content, we’re basically going to be using our AI assistants. We’ll ask them questions, and they’ll provide the answers. They’ll assist us in our everyday lives”.

This further highlights the need to prevent monopoly in the production of these assistants. If it is through them that we are going to see and interact with the world, then there should be models as diverse as the world we live in. And, thanks to open-source base models, we are already seeing that happen.

Democratising AI Wholeheartedly

India is emerging as an open-source AI champion. From developing Devika, the open source alternative to Devin, and creating Ambari, a bilingual Kannada model built on top of Llama 2, to Telugu LLM Labs and Odia Llama, AI models in Indic languages are the biggest focus of the open source AI developers in India.

India’s vast diversity in languages, cultures, and populations means that a one-size-fits-all approach would not work here. Instead, open source allows for the creation of customised versions tailored to specific user groups, locations, regions, religions, etc., without the need to start from scratch for every individual use case.

Sarvam AI is building models such as OpenHathi on top of Llama. Another notable mention is the Indian agri-tech startup KissanAI, which unveiled Dhenu Vision LLMs for crop disease detection.

BharatGPT unveiled Hanooman, a new suite of Indic GenAI models. The makers said, “We don’t want it to be like ChatGPT, which suffers from the ‘I’m God and I know everything’ syndrome.” The primary focus areas are healthcare and education. Tech Mahindra’s foundational model Project Indus, is an initiative to challenge OpenAI.

Recent developments in other parts of the world like South Korean AI company Kakao Brain’s projects like KoGPT, a large-scale language model for Korean, and Karlo, an image generation model also paint a promising picture. The company aims to contribute to the AI community with open-source projects.

Tokyo-based Sakana AI, reported to be Japan’s first AI startup, is another such example.

All these developments from different regions of the world, involving different languages, cultures, etc., paint an optimistic outlook for LeCun’s suggestion that virtual assistants and AI platforms must be open-source, “Otherwise our culture will be controlled by a few companies on the West Coast of the US or in China.”

“What’s important now is that a lot of governments are thinking about the benefits and dangers of AI. Some of them are thinking that AI is too dangerous to put in the hands of everyone and they’re trying to regulate it and basically make open source AI illegal; regulate it out of existence. I think that’s extremely dangerous for the future of humanity,” LeCun said.

He emphasised that “it’s too dangerous to have AI controlled by a small number of people”.

Still a Lot to Improve

While moving towards such a future, as envisioned by LeCun, we should remember that LLMs and AI assistants can also become the harbinger of chaos and increase the amount of misinformation on the internet massively.

The GPT-4 paper reads, “Novel capabilities often emerge in more powerful models”, and highlights how the model can become “agentic”, meaning it can independently develop and pursue goals not originally programmed during its training.

“The model isn’t accurate in admitting its limitations,” which is a crucial point to note for every single user as well, said the paper.

Talking about the scope of misinformation, Air Canada’s chatbot goof-up serves as a warning sign. In that incident, according to a passenger’s screenshot of a conversation with Air Canada’s chatbot, the passenger was told he could apply for the refund “within 90 days of the date your ticket was issued” by completing an online form.

Source: X

However, when he applied for a refund, Air Canada said bereavement rates did not apply to completed travel and pointed to the bereavement section of the company’s website. Finally, the company was found liable for its chatbot’s misleading advice.

So, while envisioning a future in which most consumer access to the internet will be agents acting for consumers doing tasks and fending off marketers and bots. And, where tens of billions of agents on the internet will be normal, as posted by Vinod Khosla, we should also ensure that these agents are intelligent and reliable; built by diverse companies, based on diverse data, and cater to the needs of a diverse population.

The post ‘AI Platforms will Control What Everybody Sees,’ Says Meta’s AI Chief Yann LeCun appeared first on Analytics India Magazine.

Future is Limitless

Future is Limitless

In a major breakthrough in the wearable AI space, a US-based company launched Limitless Pendant, an AI-powered device that captures insights from the wearer’s life. According to its creators, the Pendant safeguards user privacy and upholds confidentiality in digital interactions.

A revolutionary feature, Consent Mode, has been incorporated to enable selective voice recording by identifying and capturing only the voices of individuals who have granted consent.

In an introduction video of the product, Dan Siroker, co-founder and CEO at Limitless, said, “Limitless augments, does not replace human intelligence with artificial intelligence to overcome the brain’s limitations, specifically our memory and our focus. Our brains are bombarded with notifications and information. Multitasking is a myth. We can only really do one thing at a time”.

Balancing AI and Data Privacy

There are legitimate concerns about privacy and security with the amount of personal data collected and transmitted by wearables. According to Siroker, ad-driven companies like Facebook have promoted convenience should come at the cost of privacy.

And so to address data privacy concerns, Confidential Cloud has been introduced in Limitless. It delivers the same privacy guarantees as Rewind (also a personalised AI pendant) but with the convenience of being in the cloud.

Unlike public cloud, this ensures data is safe from the employer, the software provider, and the government. Further, the Confidential Cloud offers unparalleled security measures, ensuring that user data remains encrypted and inaccessible without explicit user authorisation even under legal compulsion.

Will it Replace Smartphones?

Humane AI spent the last year making the case that the Ai Pin will replace smartphones, allowing people to interact more with the real world. Currently launched only in the US, it costs USD 700, plus a USD 24.99 monthly subscription.

But recently, it received a mixed reaction as one of the reviewers, Marques Brownlee wrote, “This thing is bad at almost everything it does, basically all the time.”

And now, with the Limitless Pendant, slated to begin shipping in August at USD 99, the question arises: will it deliver as promised?

The CEO has proudly announced exceeding 10,000 orders within a single day of the product’s launch. While Marc Andreessen, co-founder of Andreessen Horowitz, has defined this innovation as ‘the future’.

Post the announcement, many people are curious to try the new product while few define this as the ‘first proper AI wearable device’.

There is also a discussion about how the Humane Ai Pin can improve based on this Pendant’s innovation.

By striking a balance between wearables, AI, and data privacy, the full potential of these technologies can be unlocked while prioritising responsible and ethical handling of personal data.

With major announcements concerning privacy, consent mode, and confidential cloud, the Limitless AI Pendant is poised to provide a personalised AI experience. The future can’t get any more exciting than this.

The post Future is Limitless appeared first on Analytics India Magazine.

Pega Introduces GenAI Coach to Enhance User Performance

Pegasystems has recently introduced Pega GenAITM Coach, a generative AI-powered mentor for Pega solutions that proactively advises users to help them achieve optimal outcomes.

Pega GenAI Coach directly integrates into workflows and acts as an always-on mentor within Pega solutions. It analyses work and guides users with salient advice to overcome roadblocks. Managers can use Coach to quickly get up to speed on their team’s work and surface insights into their team’s performance.

Built on the Pega GenAITM architecture, Coach handles complex workflows with auditability, security, and guardrails appropriate for the enterprise.

Its seamless integration with Pega AI capabilities allows enterprises to leverage a broad spectrum of AI. For example, Coach can use statistical predictions from Pega Process AI to suggest ways to avoid fines related to missed deadlines.

Coach will also access information synthesised by Pega GenAI™ Knowledge Buddy to bring relevant enterprise knowledge directly to users as they work on cases.

Leveraging an organisation’s own best practices for sales, service, and operations, Coach quickly analyses a user’s work and relevant data in context to intelligently guide them toward better and faster results.

Coaches can be easily configured to ensure each Coach is tailored to an organization’s objectives and their employees’ specific needs. This helps users to achieve objectives such as:

  • Optimise sales team performance: Coach analyzes existing opportunity, lead, contact, and interaction data within Pega Sales AutomationTM and offers suggestions to help overcome barriers in moving deals forward. Sales leadership can easily input industry knowledge and their own best practices directly into Coach, helping ensure their teams are getting industry and business-specific advice to provide prospects and clients a better sales experience.
  • Improve back-office operations: Coach helps ensure back-office case workers can better complete complex work by providing personalised instructions that leverage an understanding of procedural information, regulatory requirements, and case data, keeping everything running smoothly.
  • Quickly resolve healthcare claims: Coach can help caseworkers by quickly analyzing and summarizing a customer’s claim, plan, and history to surface answers for customer inquiries, while also providing guidance on the best path to resolution if further steps are required.

The post Pega Introduces GenAI Coach to Enhance User Performance appeared first on Analytics India Magazine.

Indian Govt Deal with Nvidia Post-Elections Likely: Reports

A deal between the Indian government and Nvidia could be struck post elections to source GPUs, according to a report.

The deal is meant to help the government source GPUs in order to aid local startups, research centres and institutions at a subsidised rate. While exact details on the timeline of the deal are unclear, government officials said that this would likely be done post-elections.

The government is planning on opting for a rent-and-sublet model, wherein they will provide the GPUs to help aid in building AI compute infrastructure. This will be done under the recently greenlit 10,000 crore IndiaAI Mission.

One official stated that the reason for striking the deal is to get ahead in terms of setting up the infrastructure needed to power AI within the country. This follows news of both China and the US acquiring chips for the same purpose.

According to the official, the government decided to approach Nvidia due to the US-based company’s current stronghold over the GPU market.

Apart from this, officials stated that companies are free to strike up their own renting and subletting deals with suppliers to source GPUs. If companies choose to opt for the marketplace model, one government official said, the government will provide incentives under a production-linked incentive (PLI) scheme.

“For the second option, some kind of marketplace will have to be thought of where performance can be objectively measured and thus the incentives be distributed. Or we will need a PLI-sort of compensation formula where the company which has worked out a GPU deal can show credible results to obtain the necessary benefits,” the official said.

However, considering the rising costs of GPUs, with Nvidia’s Blackwell GPUs costing around $40,000, the decision was made to forge a public partnership with the company.

This news comes on the heels of India’s Yotta Data Services acquiring 4,000 H100 GPUs from Nvidia last month. Following this, Deloitte India also announced a partnership with Yotta for access to their infrastructure in order to aid their clients.

The post Indian Govt Deal with Nvidia Post-Elections Likely: Reports appeared first on Analytics India Magazine.

Hugging Face Unveils Idefics2, an 8B Vision-Language Model

Hugging Face has released Idefics2, an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. The model can answer questions about images, describe visual content, create stories grounded on multiple images, or simply behave as a pure language model without visual inputs.

Click here to check it out.

Idefics2 surges past its forerunner, Idefics1, boasting only 8 billion parameters and the flexibility granted by its open license (Apache 2.0), alongside significantly augmented Optical Character Recognition (OCR) capabilities.

Introducing Idefics 2 🤯
An 8B Vision-Language Model – literally punching above its weight.
> Apache 2.0 licensed! 🔥
> Competitive with 30B models like MM1-Chat
> 12 point increase in VQAv2, 30 point increase in TextVQA (compared to Idefics 1)
> 10x fewer parameters than… pic.twitter.com/uhIvbpypwJ

— Vaibhav (VB) Srivastav (@reach_vb) April 15, 2024

In a remarkable feat of Idefics1, the new Idefics2 model outperformed larger rivals in the visual tasks, as the model has not only achieved exceptional performance on visual question answering benchmarks, but has also outperformed significantly larger language models like LLava-Next-34B and MM1-30B-chat.

Developed by the Hugging Face M4 team, the model is trained on a wide range of openly available datasets, including web documents, image-caption pairs, and OCR data. Additionally, the model was fine-tuned on a novel dataset called ‘The Cauldron,’ which amalgamated 50 carefully curated datasets for multifaceted conversational training.

A significant architectural advancement in Idefics2 is the simplification of integrating visual features into the language backbone. The adoption of a Learned Perceiver Pooling and MLP modality projection has enhanced the model’s overall efficacy, marking a shift from its predecessor’s architecture.

Idefics2 exhibits a refined approach to image manipulation, maintaining native resolutions and aspect ratios, deviating from the conventional resizing norms in computer vision.

The post Hugging Face Unveils Idefics2, an 8B Vision-Language Model appeared first on Analytics India Magazine.

KaleidEO Achieves Milestone in Earth Observation Payload Development

KaleidEO Space Systems, a Bengaluru-based startup and subsidiary of SatSure, has successfully conducted aerial testing of its high-performance optical and multi-spectral earth observation payload. This achievement makes KaleidEO one of the first private Indian companies to design and develop a high-resolution optical EO payload.

The prototype payload was tested aboard an aircraft over the UK and Austria in collaboration with the Global Assistant & Logistic Group. The test aimed to assess the payload’s functionality and stability in a real-world environment. The payload captured images in five bands (red, blue, green, near-infrared, and red edge) at a spatial resolution of 16 cm.

KaleidEO’s satellite version of this payload is expected to capture images globally at a 1-meter resolution and 65 km swath, which could be disruptive due to the scale and quality of data collection for various applications, including agriculture, urban planning, critical infrastructure asset monitoring, and the government’s strategic planning needs.

The payload employs motion compensation and pixel shift methodology for super-resolving captured images, similar to the technique popularised by smartphones but applied to imaging moving objects at speeds of 7 km/s from orbit.

Akash Yalgach, Co-founder and Chief Technology Officer at KaleidEO, expressed his excitement about the successful test. He stated that it has given the team confidence for their planned 4-satellite fleet mission in 2026. He hopes their efforts will bolster Earth-imaging capabilities and open doors for the era of space-based innovation from India.

KaleidEO’s founding team comprised young ex-ISRO scientists and was set up by deep tech firm SatSure, which closed a Series A round of $15 million in equity capital last year. SatSure counts HDFC Bank, ICICI Bank, ADB Ventures, and Transunion among its strategic investors and would be an anchor-tenant customer for KaleidEO’s fleet of hi-resolution satellite imagery.

Prateep Basu, Founder & CEO of SatSure & KaleidEO, commented on the feat, stating that high-quality, affordable satellite imagery is still a myth, but KaleidEO aims to break barriers by democratising access to such data for users in India and other developing countries through cutting-edge hardware innovation.

The post KaleidEO Achieves Milestone in Earth Observation Payload Development appeared first on Analytics India Magazine.

Hugging Face releases a benchmark for testing generative AI on health tasks

Hugging Face releases a benchmark for testing generative AI on health tasks Kyle Wiggers 7 hours

As I wrote recently, generative AI models are increasingly being brought to healthcare settings — in some cases prematurely, perhaps. Early adopters believe that they’ll unlock increased efficiency while revealing insights that’d otherwise be missed. Critics, meanwhile, point out that these models have flaws and biases that could contribute to worse health outcomes.

But is there a quantitative way to know how helpful — or harmful — a model might be when tasked with things like summarizing patient records or answering health-related questions?

Hugging Face, the AI startup, proposes a solution in a newly released benchmark test called Open Medical-LLM. Created in partnership with researchers at the nonprofit Open Life Science AI and the University of Edinburgh’s Natural Language Processing Group, Open Medical-LLM aims to standardize evaluating the performance of generative AI models on a range of medical-related tasks.

New: Open Medical LLM Leaderboard! 🩺

In basic chatbots, errors are annoyances.
In medical LLMs, errors can have life-threatening consequences 🩸

It's therefore vital to benchmark/follow advances in medical LLMs before thinking about deployment.

Blog: https://t.co/pddLtkmhsz

— Clémentine Fourrier 🍊 (@clefourrier) April 18, 2024

Open Medical-LLM isn’t a from-scratch benchmark per se, but rather a stitching-together of existing test sets — MedQA, PubMedQA, MedMCQA and so on — designed to probe models for general medical knowledge and related fields, such as anatomy, pharmacology, genetics and clinical practice. The benchmark contains multiple choice and open-ended questions that require medical reasoning and understanding, drawing from material including U.S. and Indian medical licensing exams and college biology test question banks.

“[Open Medical-LLM] enables researchers and practitioners to identify the strengths and weaknesses of different approaches, drive further advancements in the field and ultimately contribute to better patient care and outcome,” Hugging Face writes in a blog post.

gen AI healthcare

Image Credits: Hugging Face

Hugging Face is positioning the benchmark as a “robust assessment” of healthcare-bound generative AI models. But some medical experts on social media cautioned against putting too much stock into Open Medical-LLM, lest it lead to ill-informed deployments.

On X, Liam McCoy, a resident physician in neurology at the University of Alberta, pointed out that the gap between the “contrived environment” of medical question-answering and actual clinical practice can be quite large.

It is great progress to see these comparisons head-to-head, but important for us to also remember how big the gap is between the contrived environment of medical question answering and actual clinical practice! Not to mention the idiosyncratic risks these metrics can't capture.

— Liam McCoy, MD MSc (@LiamGMcCoy) April 18, 2024

Hugging Face research scientist Clémentine Fourrier — who co-authored the blog post — agreed.

“These leaderboards should only be used as a first approximation of which [generative AI model] to explore for a given use case, but then a deeper phase of testing is always needed to examine the model’s limits and relevance in real conditions,” Fourrier said in a post on X. “Medical [models] should absolutely not be used on their own by patients, but instead should be trained to become support tools for MDs.”

It brings to mind Google’s experience several years ago attempting to bring an AI screening tool for diabetic retinopathy to healthcare systems in Thailand.

As Devin reported in 2020, Google created a deep learning system that scanned images of the eye, looking for evidence of retinopathy — a leading cause of vision loss. But despite high theoretical accuracy, the tool proved impractical in real-world testing, frustrating both patients and nurses with inconsistent results and a general lack of harmony with on-the-ground practices.

It’s telling that, of the 139 AI-related medical devices the U.S. Food and Drug Administration has approved to date, none use generative AI. It’s exceptionally difficult to test how a generative AI tool’s performance in the lab will translate to hospitals and outpatient clinics, and — perhaps more importantly — how the outcomes might trend over time.

That’s not to suggest Open Medical-LLM isn’t useful or informative. The results leaderboard, if nothing else, serves as a reminder of just how poorly models answer basic health questions. But Open Medical-LLM — and no other benchmark for that matter — is a substitute for carefully thought-out real-world testing.