AWS Pledges $10 Million to Accelerate Pediatric Health Care Research

Amazon Web Services (AWS) announced a $10 million commitment to support pediatric health care research and understanding rare diseases. The announcement was made at the AWS Summit in Washington, D.C.

The funding aims to provide critical support to nonprofit institutions harnessing AWS cloud technology to advance children’s health care worldwide.

The initiative includes a $3 million philanthropic commitment distributed among three organizations: Children’s National Hospital in Washington, D.C.; Nationwide Children’s Hospital in Columbus, Ohio; and the Children’s Brain Tumor Network at the Children’s Hospital of Philadelphia (CHOP). Each organization will receive $1 million to support their mission-driven work.

The remaining $7 million will be available through the AWS IMAGINE Grant: Children’s Health Innovation Award. This grant will fund projects that accelerate pediatric research, advance maternal and child health, and empower the pediatric workforce and caregivers.

Nicole Giroux, founder of the Lilabean Foundation for Pediatric Brain Cancer Research, highlighted the importance of such funding. Her daughter Lila was diagnosed with inoperable brain cancer at 15 months old, and it took five years to get a precise diagnosis due to a lack of data.

AWS’s initiative aims to support a consortium of hospitals and institutions using cloud computing and artificial intelligence to accelerate research and discoveries. By managing data in the cloud, researchers can better understand the genetic makeup of diseases, leading to quicker and more accurate diagnoses and personalized treatments.

Adam Resnick, director of the Center for Data Driven Discovery in Biomedicine at CHOP, emphasised the importance of this initiative. “Despite being a rare disease, pediatric cancers provide a unique proving ground for new technology due to their dependency on real-time discovery and collaborative networks,” Resnick said.

Researchers at Nationwide Children’s Hospital use AWS cloud tools to compute on genomic data and share diagnostic results for pediatric cancer patients across the U.S. The anonymized data is shared with the NCI Childhood Cancer Database, allowing broader access for researchers.

Elaine Mardis, co-executive director of the Steve and Cindy Rasmussen Institute for Genomic Medicine at Nationwide Children’s Hospital, explained the impact of sharing data in cloud-based databases. This allows for better identification of genomic aspects of rare cancers, attracting more scientific attention.

AI-powered applications are also being used to screen babies for rare genetic conditions and to provide low-cost, portable ultrasound imaging for rheumatic heart disease in resource-limited settings. In Uganda, for instance, 200,000 children are expected to be screened for rheumatic heart disease in the coming years.

Marius George Linguraru, Connor Family Professor and Chair of Research and Innovation at Children’s National Hospital, discussed the application of AI in personalizing cancer treatment plans for children with brain tumors.

AWS’s commitment aims to improve the outlook for families dealing with pediatric diseases, providing more options and better treatment plans. “For every family that’s diagnosed, I want their child to have options,” Giroux said.

AWS’s philanthropic efforts will empower nonprofit institutions worldwide to leverage the power of cloud computing and AI to advance pediatric health care.

The post AWS Pledges $10 Million to Accelerate Pediatric Health Care Research appeared first on Analytics India Magazine.

This Week in AI: The fate of generative AI is in the courts’ hands

Close up of justice scales icon on screen

Hiya, folks, and welcome to TechCrunch’s regular AI newsletter.

This week in AI, music labels accused two startups developing AI-powered song generators, Udio and Suno, of copyright infringement.

The RIAA, the trade organization representing the music recording industry in the U.S., announced lawsuits against the companies on Monday, brought by Sony Music Entertainment, Universal Music Group, Warner Records and others. The suits claim that Udio and Suno trained the generative AI models underpinning their platforms on labels’ music without compensating those labels — and request $150,000 in compensation per allegedly infringed work.

“Synthetic musical outputs could saturate the market with machine-generated content that will directly compete with, cheapen and ultimately drown out the genuine sound recordings on which the service is built,” the labels say in their complaints.

The suits add to the growing body of litigation against generative AI vendors, including against big guns like OpenAI, arguing much the same thing: that companies training on copyrighted works must pay rightsholders or at least credit them — and allow them to opt out of training if they wish. Vendors have long claimed fair use protections, asserting that the copyrighted data they train on is public and that their models create transformative, not plagiaristic, works.

So how will the courts rule? That, dear reader, is the billion-dollar question — and one that’ll take ages to sort out.

You’d think it’d be a slam dunk for copyright holders, what with the mounting evidence that generative AI models can regurgitate nearly (emphasis on nearly) verbatim the copyrighted art, books, songs and so on they’re trained on. But there’s an outcome in which generative AI vendors get off scot-free — and owe Google their good fortune for setting the consequential precedent.

Over a decade ago, Google began scanning millions of books to build an archive for Google Books, a sort of search engine for literary content. Authors and publishers sued Google over the practice, claiming that reproducing their IP online amounted to infringement. But they lost. On appeal, a court held that Google Books’ copying had a “highly convincing transformative purpose.”

The courts might decide that generative AI has a “highly convincing transformative purpose,” too, if the plaintiffs fail to show that vendors’ models do indeed plagiarize at scale. Or, as The Atlantic’s Alex Reisner proposes, there may not be a single ruling on whether generative AI tech as a whole infringes. Judges could well determine winners model by model, case by case — taking each generated output into account.

My colleague Devin Coldewey put it succinctly in a piece this week: “Not every AI company leaves its fingerprints around the crime scene quite so liberally.” As the litigation plays out, we can be sure that AI vendors whose business models depend on the outcomes are taking detailed notes.

News

Advanced Voice Mode delayed: OpenAI has delayed advanced Voice Mode, the eerily realistic, nearly real-time conversational experience for its AI-powered chatbot platform ChatGPT. But there aren’t any idle hands at OpenAI, which also this week acqui-hired remote collaboration startup Multi and released a macOS client for all ChatGPT users.

Stability lands a lifeline: On the financial precipice, Stability AI, the maker of open image-generating model Stable Diffusion, was saved by a group of investors that included Napster founder Sean Parker and ex-Google CEO Eric Schmidt. Its debts forgiven, the company also appointed a new CEO, former Weta Digital head Prem Akkaraju, as part of a wide-ranging effort to regain its footing in the ultra-competitive AI landscape.

Gemini comes to Gmail: Google is rolling out a new Gemini-powered AI side panel in Gmail that can help you write emails and summarize threads. The same side panel is making its way to the rest of the search giant’s productivity apps suite: Docs, Sheets, Slides and Drive.

Smashing good curator: Goodreads’ co-founder Otis Chandler has launched Smashing, an AI- and community-powered content recommendation app with the goal of helping connect users to their interests by surfacing the internet’s hidden gems. Smashing offers summaries of news, key excerpts and interesting pull quotes, automatically identifying topics and threads of interest to individual users and encouraging users to like, save and comment on articles.

Apple says no to Meta’s AI: Days after The Wall Street Journal reported that Apple and Meta were in talks to integrate the latter’s AI models, Bloomberg’s Mark Gurman said that the iPhone maker wasn’t planning any such move. Apple shelved the idea of putting Meta’s AI on iPhones over privacy concerns, Bloomberg said — and the optics of partnering with a social network whose privacy policies it’s often criticized.

Research paper of the week

Beware the Russian-influenced chatbots. They could be right under your nose.

Earlier this month, Axios highlighted a study from NewsGuard, the misinformation-countering organization, that found that the leading AI chatbots are regurgitating snippets from Russian propaganda campaigns.

NewsGuard entered into 10 leading chatbots — including OpenAI’s ChatGPT, Anthropic’s Claude and Google’s Gemini — several dozen prompts asking about narratives known to have been created by Russian propagandists, specifically American fugitive John Mark Dougan. According to the company, the chatbots responded with disinformation 32% of the time, presenting as fact false Russian-written reports.

The study illustrates the increased scrutiny on AI vendors as election season in the U.S. nears. Microsoft, OpenAI, Google and a number of other leading AI companies agreed at the Munich Security Conference in February to take action to curb the spread of deepfakes and election-related misinformation. But platform abuse remains rampant.

“This report really demonstrates in specifics why the industry has to give special attention to news and information,” NewsGuard co-CEO Steven Brill told Axios. “For now, don’t trust answers provided by most of these chatbots to issues related to news, especially controversial issues.”

Model of the week

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) claim to have developed a model, DenseAV, that can learn language by predicting what it sees from what it hears — and vice versa.

The researchers, led by Mark Hamilton, an MIT PhD student in electrical engineering and computer science, were inspired to create DenseAV by the nonverbal ways animals communicate. “We thought, maybe we need to use audio and video to learn language,” he said told MIT CSAIL’s press office. “Is there a way we could let an algorithm watch TV all day and from this figure out what we’re talking about?”

DenseAV processes only two types types of data — audio and visual — and does so separately, “learning” by comparing pairs of audio and visual signals to find which signals match and which don’t. Trained on a dataset of 2 million YouTube videos, DenseAV can identify objects from their names and sounds by searching for, then aggregating, all the possible matches between an audio clip and an image’s pixels.

When DenseAV listens to a dog barking, for example, one part of the model hones in on language, like the word “dog,” while another part focuses on the barking sounds. The researchers say this shows DenseAV can not only learn the meaning of words and the locations of sounds but it can also learn to distinguish between these “cross-modal” connections.

Looking ahead, the team aims to create systems that can learn from massive amounts of video- or audio-only data — and scale up their work with larger models, possibly integrated with knowledge from language-understanding models to improve performance.

Grab bag

No one can accuse OpenAI CTO Mira Murati of not being consistently candid.

Speaking during a fireside at Dartmouth’s School of Engineering, Murati admitted that, yes, generative AI will eliminate some creative jobs — but suggested that those jobs “maybe shouldn’t have been there in the first place.”

“I certainly anticipate that a lot of jobs will change, some jobs will be lost, some jobs will be gained,” she continued. “The truth is that we don’t really understand the impact that AI is going to have on jobs yet.”

Creatives didn’t take kindly to Murati’s remarks — and no wonder. Setting aside the apathetic phrasing, OpenAI, like the aforementioned Udio and Suno, faces litigation, critics and regulators alleging that it’s profiting from the works of artists without compensating them.

OpenAI recently promised to release tools to allow creators greater control over how their works are used in its products, and it continues to ink licensing deals with copyright holders and publishers. But the company isn’t exactly lobbying for universal basic income — or spearheading any meaningful effort to reskill or upskill the workforces its tech is impacting.

A recent piece in The Wall Street Journal found that contract jobs requiring basic writing, coding and translation are disappearing. And a study published last November shows that, following the launch of OpenAI’s ChatGPT, freelancers got fewer jobs and earned much less.

OpenAI’s stated mission, at least until it becomes a for-profit company, is to “ensure that artificial general intelligence (AGI) — AI systems that are generally smarter than humans — benefits all of humanity.” It hasn’t achieved AGI. But wouldn’t it be laudable if OpenAI, true to the “benefiting all of humanity” part, set aside even a small fraction of its revenue ($3.4 billion+) for payments to creators so they aren’t dragged down in the generative AI flood?

I can dream, can’t I?

Berkeley Lab Researchers Harness Power of AI for Plant Root Analysis

Roots play a pivotal role in the plant’s life cycle, including its water uptake, nutrient absorption, soil interaction, stability, and adaptation to changing environmental conditions. Despite significant progress in plant science, researchers have struggled to fully understand the intricate structures, growth dynamics, and responses to environmental stresses of the hidden half of living plants.

In an investigation to boost agricultural yields and develop plants resistant to climate change, Berkeley Lab researchers have introduced RhizoNet, a computational tool that harnesses the power of AI to transform plant root analysis and empower scientists to uncover new insights about root behavior under various environmental conditions.

The research was conducted by Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) Applied Mathematics and Computational Research (AMCR) and Environmental Genomics and Systems Biology (EGSB) divisions.

(NicoElNino/Shutterstock)

The AI-powered tool, detailed in a paper published in Scientific Reports, is designed to automate the root analysis process to deliver unprecedented accuracy. It uses an advanced deep learning approach based on a convolutional neural network, allowing it to segment plant roots for a comprehensive biomass and growth assessment.

Traditional methods for plant root analysis, such as those that rely on flatbed scanners and manual segmentation methods, are labor-intensive and prone to human error. They also limit the ability of scientists to capture the fine details of root growth and behavior, especially for complex root systems. Now with RhizoNet, researchers have a tool to track root growth and biomass with greater precision.

“The capability of RhizoNet to standardize root segmentation and phenotyping represents a substantial advancement in the systematic and accelerated analysis of thousands of images. This innovation is instrumental in our ongoing efforts to enhance the precision in capturing root growth dynamics under diverse plant conditions,” said Daniela Ushizima, lead investigator of the AI-driven software, Berkeley Lab.

The key challenges in plant root analysis is the intricate nature of root structure and the presence of “noisy backgrounds” such as bubbles, droplets, reflections, and shadows that can complicate the root image segmentation. In some cases, the fine structures are only as wide as a pixel making it extremely challenging for even the best human annotators.

To address this issue, RhizoNet is reinforced by the latest version of EcoFAB, a unique hydroponic device that facilitates in-situ plant imaging. This device was developed by the EGSB, the DOE Joint Genome Institute (JGI), and the Climate & Ecosystem Sciences division at Berkeley Lab. EcoFAB can provide detailed imaging of root systems, eliminating the complexities of manual annotation and traditional imaging methods.

The Scientific Reports paper illustrates how the Berkeley Lab researchers used RhizoNet and EcoFAB in the analysis of root scans of Brachypodium distachyon plants under different conditions over five weeks. The high-throughput nature of EcoBOT, the new image acquisition system for EcoFABs, enabled the researchers to perform systematic experimental monitoring.

“We’ve made a lot of progress in reducing the manual work involved in plant cultivation experiments with the EcoBOT, and now RhizoNet is reducing the manual work involved in analyzing the data generated,” noted Peter Andeer, a research scientist in EGSB and a lead developer of EcoBOT, who collaborated with Ushizima on this work. “This increases our throughput and moves us toward the goal of self-driving labs.”

The accuracy and efficiency of RhizoNet are poised to drive research efforts toward more efficient and insightful plant studies. However, the new technology is not without challenges.

There are concerns about the standardization of data interpretation generated by AI algorithms. Researchers will need to ensure the reproducibility and accuracy of the AI model across different setups and plant species. In addition, they will have to address the need for continuous validation and optimization to maintain the tool’s efficacy over time.

While there are challenges in the use of RhizoNet in various settings, it still represents a paradigm shift in plant roof analysis. Researchers are now equipped with a powerful tool to explore the hidden dimensions of root biology. It could lead to solutions for improved crop productivity, climate resilience, sustainable agriculture, and other benefits.

“Our next steps involve refining RhizoNet’s capabilities to further improve the detection and branching patterns of plant roots,” said Ushizima. "We also see potential in adapting and applying these deep-learning algorithms for roots in soil as well as new materials science investigations.”

Related Items

AI Saves the Planet this Earth Day

Permutable AI Offers an Initial Glimpse of GreenProof Framework Exposing Sustainability Claims in the Financial Sector

Going Green in the AI Era: an Imperative for Performance-Intensive Applications

You can now download the ChatGPT app for MacOS for free. Here’s how

OpenAI ChatGPT GPT-4o

If you regularly use ChatGPT, you likely keep ChatGPT open in a tab in your browser for easier access. Now, the ChatGPT application for macOS offers a much more efficient way to access ChatGPT and optimize your workflow.

In May 2024, OpenAI unveiled its ChatGPT application for macOS users, but only made it available for ChatGPT Plus subscribers. On Tuesday, the startup announced that it was making the app available to all users, regardless of whether you pay for a subscription.

One of the biggest wins of downloading the application is quicker ChatGPT access. By simply hitting the option and space keys on your keyboard, ChatGPT will open from any screen on your desktop.

Also: How to use ChatGPT to digitize your handwritten notes for free

Additionally, the app can take quick screenshots of your screen to assist with whatever you are working on. For example, if you are coding, you can ask ChatGPT to take a screenshot and ask a question corresponding to the code on the screen. You can do the same with anything you are working on, such as documents, websites, emails, etc.

Uploading files is also easier. When the bar pops up, you can drag and drop files or click on the paperclip icon to upload a file from your computer, again speeding up getting assistance with whatever you need. Another perk is starting a conversation with ChatGPT simply by clicking the headphone icon.

After testing the ChatGPT MacOS application, ZDNET's Maria Diaz found it much more helpful than initially expected.

"The MacOS app features a fast, user-friendly interface that sets itself apart from others with its simplicity. Here's what makes it even more helpful: You can access it anytime with a keyboard shortcut," Diaz said.

Also: Gmail users can now ask Google's Gemini AI to help compose and summarize emails

So, how do you get started? To download the application, you need a Mac with Apple Silicon (M1 or higher) and macOS 14+ installed. Then, all you need to do is visit this OpenAI webpage, click download, and follow the steps when prompted.

If you already have the ChatGPT interface open, you can click on your profile picture in the upper right-hand corner and hit "Download the macOS app," which will take you through the same installation process. If you are a Windows user, don't worry. OpenAI has said that a ChatGPT desktop application will be released for Windows later in 2024.

Hugging Face Launches Open LLM Leaderboard, Chinese Models Dominate

Hugging Face, the AI community platform, has unveiled a brand new open large language model (LLM) leaderboard, with Chinese models taking the top spots, according to an announcement today from co-founder and CEO Clem Delangue.

The leaderboard ranks open-source LLMs based on extensive new evaluations, including the MMLU-pro benchmark, which tests models on high school and college-level problems.

Hugging Face utilised 300 NVIDIA H100 GPUs to re-evaluate all major open LLMs for the updated rankings.

Alibaba’s Qwen-72B model emerged as the top performer overall, outpacing other open-source models, highlighting the rapid progress of Chinese AI companies in the LLM space.

The dominance of Chinese models on the leaderboard underscores the increasingly competitive and global nature of the open-source AI ecosystem.

Delangue noted that previous LLM benchmarks have become “too easy” for the latest models, comparing it to “grading high school students on middle school problems.” This suggests the need for more rigorous and challenging evaluations as open-source language models grow more sophisticated.

However, Delangue also cautioned that some AI developers may be overly focused on optimising for specific benchmarks at the expense of well-rounded model performance. The leaderboard results also indicated that simply increasing model size does not always translate to superior performance.

The launch of Hugging Face’s new open LLM leaderboard marks an important step in the field’s maturation in terms of transparent and comprehensive evaluation. With Chinese models leading the pack, the leaderboard will likely spur further innovation and investment in open-source AI technologies worldwide.

Hugging Face, founded in 2016, has become a central hub for open-source machine learning, hosting over 250,000 models and datasets used by a community of 200,000 developers.

The post Hugging Face Launches Open LLM Leaderboard, Chinese Models Dominate appeared first on Analytics India Magazine.

AWS Announces $50 Million Generative AI Initiative for Public Sector

At the AWS Washington DC event, the company has unveiled the AWS Public Sector Generative Artificial Intelligence (AI) Impact Initiative, a two-year, $50 million investment aimed at accelerating AI innovation in public sector organisations.

The initiative, which will run from June 26, 2024, through June 30, 2026, seeks to support critical missions in government, nonprofit, education, healthcare, and aerospace sectors through AWS generative AI services and infrastructure, including Amazon Bedrock, Amazon Q, Amazon SageMaker, AWS HealthScribe, AWS Trainium, and AWS Inferentia.

The initiative will provide up to $50 million in AWS Promotional Credits, training, and technical expertise. Determination of credit issuance will consider factors such as the customer’s experience with technology solutions, project maturity, evidence of future adoption, and generative AI skills.

This program is open to both new and existing AWS Worldwide Public Sector customers and partners globally, who are building generative AI solutions to address pressing societal challenges.

Public sector leaders are increasingly looking to generative AI to enhance efficiency and agility amidst challenges like resource optimisation, evolving needs, patient care improvement, personalised education, and strengthened security.

AWS is committed to helping these organisations leverage generative AI and cloud technologies to make a positive societal impact.

“This initiative builds on our ongoing commitment to the safe, secure, and responsible development of AI technology,” said Dave Levy, Vice President of AWS Public Sector. “We are contributing to programs like the National Artificial Intelligence Research Resource and the U.S. Artificial Intelligence Safety Institute Consortium to ensure AI’s safe and ethical development.”

The initiative will offer tailored training, expertise from the Generative AI Innovation Center, technical support, networking opportunities, and global thought leadership platforms. These resources aim to help public sector entities ideate, identify, and implement secure generative AI solutions.

The post AWS Announces $50 Million Generative AI Initiative for Public Sector appeared first on Analytics India Magazine.

Formation Bio raises $372M to boost drug development with AI

array of pills and tablets

Formation Bio, a startup focused on applying AI to drug discovery with backing from OpenAI CEO Sam Altman, has raised over a quarter-billion dollars to support its ambitious product roadmap.

Formation announced Wednesday that it raised $372 million in a Series D funding round led by Andreessen Horowitz with participation from drugmaker Sanofi, Sequoia, Thrive, Emerson Collective, Lachy Groom, SV Angel Growth and FPV Ventures. The new tranche brings Formation’s total raised to more than $600 million (according to Pitchbook), which the company says is being put mainly toward partnership acquisition efforts and R&D.

Formation declined to reveal its new valuation. But a spokesperson told TechCrunch that it’s a “material step up” from $1 billion, Formation’s Series C valuation.

The company, which previously went by the brand TrialSpark, was co-founded by Benjamine Liu and Linhao Zhang in 2016. Liu has a background in computational biology, having conducted neuroscience research at Oxford and UPenn. Zhang is a software developer by trade and worked at Salesforce before joining Oscar Insurance as a product engineer.

Formation builds tech-forward solutions for clinical trials and drug development. The company licenses drug IP from and co-develops drugs with biotech and pharma companies, and develops these drugs past clinical proof-of-concept.

Drug development is a notoriously expensive and challenging endeavor. It takes 10 to 15 years on average to take a drug from initial discovery through regulatory approval, with the cost per drug reaching up to $5.5 billion. And an estimated 90% of drugs fail to reach the market.

Formation claims it’s able to run clinical trials more efficiently by streamlining processes such as study startup, participant recruitment and data management. For example, the company is currently deploying AI to generate patient recruitment materials and reports for “adverse events.” It’s also fine-tuning AI models to provide drug development teams recommendations for R&D decisions and better predict drug toxicity, tolerability and efficacy.

Last month, Formation announced a collaboration with OpenAI and Sanofi to jointly design and develop customized AI solutions for drug development. OpenAI said it would contribute access to AI capabilities and expertise, and Sanofi said it would bring proprietary data for developing AI models.

OpenAI’s involvement gives the appearance of conflict of interest, given that Altman was involved in Formation’s Series C fundraising; we’ve reached out to OpenAI for more information and will update this post if we hear back.

“At Sanofi, we’re all in on AI,” Sanofi CEO Paul Hudson said in a press release. “And we are proud to partner with and invest in Formation Bio, whose AI-driven drug development vision and capabilities will help lead our industry forward in the shared ambition to accelerate and improve how we bring more new medicines to patients.”

Formation has three drug candidates in its clinical pipeline, including treatments for chronic hand eczema, sensory neuropathy and knee osteoarthritis. The furthest along is the eczema treatment, which recently reached phase 3 — the last stage of testing before a drug is submitted to regulatory authorities.

A number of startups are attempting to pioneer AI-powered tech for drug discovery, including EvolutionaryScale, which emerged from stealth this week with investments from Amazon and Nvidia. Market research firm Markets and Markets anticipates that the market for AI in drug discovery will be worth $4.9 billion by 2028. Major players in the space include Xaira (which launched with $1 billion), DeepMind spin-off Isomorphic, Insilico, Jeff Dean-backed Profluent, Enveda and Causaly.

Databricks’ Compound AI Systems Could Crush OpenAI & Anthropic

​​With the recent release of Anthropic’s Claude Sonnet 3.5, the company has gained a temporary lead in a long-standing AI race. However, while companies like Anthropic and OpenAI stumble over themselves to better their capabilities, it seems that Databricks is taking a different approach in focusing on compound AI systems.

At the recent Data+AI Summit, Databricks announced several updates to its Mosaic AI platform, allowing its customers to build their own compound AI systems.

Speaking to AIM, Databricks’ vice president of field engineering APJ, Nick Eayrs, said that compound AI systems offer huge value compared to building a single large model.

“How do we make these models better at integrating with other systems, both upstream and downstream? How do we build tools around the models so that they can operate in these compound systems to provide better insights and capabilities for customers and citizens?” he asked.

Where Do Compound AI Systems Fit into the Ecosystem?

While there are comparisons to multimodal systems, like GPT-4o, complex AI systems are a much broader term, encompassing multimodal systems, as well as other capabilities like using multiple AI models and techniques for better and more complex reasoning.

“We believe that compound AI systems will be the best way to maximise the quality, reliability, and measurement of AI applications going forward, and may be one of the most important trends in AI in 2024,” said Databricks co-founder and CTO Matei Zaharia.

This is unsurprising, with many AI companies slowly pivoting towards offering enterprise AI services, including Microsoft, which recently killed its GPT Builder in Copilot for consumer purposes to focus on enterprise and commercial sectors.

Databricks’ Data Intelligence platform means that companies can make the best use of their data. As Databrick CEO Ali Ghodsi said, essentially democratising their data in order to allow access to it to anyone within the company, while also making sure that their data is not at the mercy of outside vendors.

Data Governance is Key

Obviously, for all of this to work, data is key – not just data but also how this data is formatted and labelled.

Using financial data and software company FactSet, which is a Databricks client, as an example, Databricks co-founder and VP of engineering Patrick Wendell pointed out that the company has a huge amount of data for existing queries “with labelled English examples so they could tune a model that understands their data extremely well”.

Another major announcement from the summit came in the form of Databricks officially open-sourcing Unity Catalog. This allows companies to standardise their data, allowing for more accurate training and information retrieval.

Will Databricks’ Focus on Democratising AI Prevail?

However, with Databricks effectively coming out with an entire ecosystem for companies to use in the name of democratising AI systems, this could change slowly.

With combined factors like industries pivoting towards AI use, the need to effectively use data and concerns on commercial data privacy, Databricks has managed to corner an untapped market.

But Zaharia is correct in saying that this has slowly become a massive trend in 2024. While Databricks has focused on ensuring that everything remains democratised and largely accessible, this doesn’t stop other AI companies from leveraging their technology and pushing their own formats for enterprises to use.

However, unlike their Tabular acquisition, the company may not be able to enforce their democratic AI ideal as big tech companies venture into the domain.

This may not necessarily be a bad thing. With Databricks ensuring democratic AI first and foremost, and their early tapping of a market, the company has essentially set a standard for how things need to be done when it comes to leveraging compound AI systems.

Even if larger companies try to commercialise these systems in their own formats, Databricks’ early cornering of the market could help them remain in the lead for years to come.

The post Databricks’ Compound AI Systems Could Crush OpenAI & Anthropic appeared first on Analytics India Magazine.

LLM Portfolio Projects Ideas to Wow Employers

LLM Portfolio Projects Ideas to Wow Employers feature image
Image by Author

Most people who lack technical knowledge may think that working with AI or LLMs (large language models) is challenging and reserved for experts and engineers. However, what if I told you that all you need is proficiency in Python to build a wide range of LLM projects, from Q&A systems to YouTube summarizers? You can even create your own GPT-4o application using multiple open-source models and components.

In this project, we will explore interesting and easily achievable LLM project ideas that you can build using free or affordable resources. Furthermore, each project idea is accompanied by a sample project link that you can examine to better understand how it works.

1. Fine-Tuning Llama 3 and Using It Locally

The Fine-Tuning Llama 3 and Using It Locally is a proper project with multiple steps and files. The goal is to fine-tune the model on a dataset of patient-doctor conversations using free resources provided by Kaggle. Once the model is successfully fine-tuned, it can answer medical-related questions in a highly professional manner.

Fine-Tuning Llama 3 and Using It Locally project image
Image from the project

In order to use the model offline on your laptop, you can follow these steps:

  1. Merge the adopter layer to the base model.
  2. Convert the model into the Llama.cpp format, known as GGUF.
  3. Reduce the size of the model using the quantization method.
  4. Finally, use the model on your laptop using the Windows Jan application.

It's important to keep medical conversations private between doctors and patients, which is why it's necessary to use it locally and ensure privacy.

2. Q&A Retrieval System

If you prefer not to fine-tune the model, you can still create context-aware AI applications locally using tools like LangChain, Chroma DB, and Ollama. This application will utilize your dataset as context before generating the response.

To build the RAG (Retrieval-augmented generation) application, you can follow these steps:

  1. Load PDF Files: Begin by loading all the PDF files from the designated folder.
  2. Split Text: Split the text into smaller chunks for efficient processing.
  3. Convert to Embeddings: Convert the text into embeddings and store them in the vector database.
  4. Build Retrieval Chain: Construct a retrieval chain using LangChain.
  5. Develop Python Application: Create a proper Python application to ensure a seamless chat experience.

Q&A Retrieval System project image
Image from project

LangChain simplifies the process by providing a high-level API and easy-to-use commands. By following the tutorial on "How to Run Llama 3 Locally," you can build a context-aware smart LLM application.

3. Serving an LLM application as an API endpoint using FastAPI in Python

In this project, you will be building an English to French translator API using the OpenAI API and FastAPI. The project will be divided into two main parts: understanding how to use the OpenAI API to ensure the generated output is always in French, and building a REST API using FastAPI to take in text and generate output through a simple CURL command.

If you are familiar with FastAPI, you can build an even better LLM application that can serve as an API endpoint in 30 minutes.

Serving an LLM application as an API endpoint using FastAPI in Python project image
Image from project

For reference, follow the guide "Serving an LLM application as an API endpoint using FastAPI in Python" and try serving your LLM. It can also be a model that you are running locally.

4. Vacation Planning Assistant

Planning a vacation without a travel agent can be difficult. There are so many moving parts, and sometimes people don't even know what to do. So, why not create your own travel itinerary application that will show your itinerary on a map and provide detailed plans and various attractions?

In this project, you will build a web application that takes user instructions about their travel plans and provides itinerary suggestions, showing them on a map.

Vacation Planning Assistant project image
Image from project

The project requires you to learn a few basics about Gradio, Launching, and the Google Maps API before diving into the project. You can start by following the "Building a Smart Travel Itinerary Suggester with LangChain, Google Maps API, and Gradio (Part 1)" tutorial, but if you want to build the Vacation Planning Assistant, you might have to add more components to your application and make it more robust.

5. YouTube Summarizer

YouTube Summarizer is a beginner-friendly project, perfect for students and newcomers to APIs and natural language processing. The project involves using the YouTube API to extract transcripts from videos and the OpenAI API to summarize these transcripts. Given that some videos can be lengthy, and the context window of models like ChatGPT might be limited, the project requires splitting the transcript into manageable parts. Each part is summarized individually, and the summarized sections are then combined to produce a coherent summary of the entire video.

YouTube Summarizer project image
Image from project

You can follow the "Create Your Own YouTube Video Summarizer App in Just 3 Easy Steps" tutorial and experience the awesomeness yourself.

Note: The project is using the old API. You can always check the OpenAI API documentation to update to the new structure.

6. Web scraping with LLMs

Web scraping can be a lucrative business, with individuals earning up to $200 per day by running a simple script. It is considered lucrative because it can be challenging to bypass certain website structures. In such cases, building an LLM-powered web scraper using Scrapy and Ollama can help automate or enhance web parsing.

Web scraping with LLMs project image
Image from books.toscrape.com

By following the guide "Web Scraping with LLMs," you can learn to use LLM on each webpage to extract specific attributes such as product name and price. The LLM eliminates the need for manual coding to extract these attributes from the webpage; all you need to change is the prompt.e prompt.

7. Build GPT4o using Open Models

Building an all-in-one AI application typically requires millions of dollars and years of research. What if I told you that you can build your own GPT-4o model using an open-source model at no cost and in just one day?

In this project, we will create a comprehensive Open GPT-4o application that can understand audio, image, and text data. It will include a live voice chat feature and video chat capabilities. Additionally, you can use it to generate images and videos. In short, it will be your AGI (Artificial General Intelligence) application.

Please note that the project does not come with a guide or tutorial, so you will have to learn everything by understanding the source code: app.py · KingNish/OpenGPT-4o at main (huggingface.co)

Build GPT4o using Open Models project image
Image from OpenGPT 4o

Before you build your LLM application, I highly recommend that you test out OpenGPT 4o. Learn about various features and what type of model it is using. Learn how efficient and fast it is.

Final Thoughts

Building an LLM portfolio project can significantly boost your career prospects. If you are a student seeking employment, these 7 projects will help you secure a job faster than others. Recruiters and HR managers are particularly impressed by projects that incorporate the latest technologies, such as AI.

To begin, bookmark this page and start building the simple projects. As you progress to more complex projects, make sure to consistently showcase your work on LinkedIn. This way, you will soon catch the eye of recruiters.

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in technology management and a bachelor's degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

More On This Topic

  • Data Science Portfolio Project Ideas That Can Get You Hired (Or Not)
  • Top Industries and Employers Hiring Data Scientists in 2022
  • How to Ace Data Science Interview by Working on Portfolio Projects
  • 7 Machine Learning Portfolio Projects to Boost the Resume
  • 5 Advance Projects for Data Science Portfolio
  • 5 Portfolio Projects for Final Year Data Science Students

Lamini AI Partners with Meta to Enhance LLaMA’s SQL Performance

Less than two weeks after the launch of Lamini Memory Tuning, Lamini AI has officially partnered with Meta.

Excited to share that @LaminiAI is partnering with @Meta 🤝 to bring you code recipes for finetuning factual LLMs & agentic workflows 🎉
Recipe #1: How to Memory Tune Llama-3 into a SQL agent for your precise schema: 30% -> 95% accuracy.https://t.co/nAwJX6vWHG
Code 👇🏻

— Sharon Zhou (@realSharonZhou) June 25, 2024

Lamini AI announced Lamini Memory Tuning on June 13, wherein the tool has showcased the ability to improve factual accuracy while reducing hallucinations by as much as 95%.

“Lamini Memory Tuning is a research breakthrough that overcomes a seeming paradox in the AI world: achieving precise factual accuracy (i.e. no hallucinations) while upholding the generalisation capabilities that make LLMs valuable in the first place,” the startup said.

The tuning method was used on open-source models like LLaMa 3 and Mistral 3. Now, however, the company has partnered with Meta to improve LLaMa 3’s baseline performance by improving the quality of SQL queries.

As part of this, Meta published a repository of Llama 3 Lamini recipes to help tune Llama models, specifically for enterprises.

“Lamini Memory Tuning is a new tool you can use to embed facts into LLMs that improve factual accuracy and reduce hallucinations. Inspired by information retrieval, this method has set a new standard of accuracy for LLMs with less developer effort,” the repository stated.

According to Lamini, the memory-tuning tool is also able to reduce response times by 50% while also decreasing workloads for data teams and increasing the overall reliability of queries, subsequently increasing the accuracy rates as well.
This is not the first time a tool has attempted to improve the efficiency of SQL queries. Previously, researchers at Nanyang Technological University, Singapore University of Technology and Design, and Alibaba‘s DAMO Academy recently introduced LLM-R2, which was a query rewrite system that helped significantly boost SQL query efficiency.

The post Lamini AI Partners with Meta to Enhance LLaMA’s SQL Performance appeared first on Analytics India Magazine.