AI — Страница 1013

COBOL’s Costly Legacy Continues to Drain Company Resources

COBOL, an essential programming language for financial legacy systems and government portals, is facing a crisis, leading to major technical debt. Born in the early 1960s, it was adopted by IBM as its primary development language.

The recent Innovation Graph updated by GitHub showed that developers are participating in hackathons like ‘Advent of Code’ to learn old and obscure programming languages like COBOL. This is one of the few times when there is interest in these languages.

Today, 43% of the financial institutions handle more than $3 trillion in daily transactions, including 95% of ATM swipes and bank accounts, which use COBOL to keep the infrastructure standing.

However, COBOL is rapidly approaching its expiration date. As experienced COBOL programmers retire, companies struggle to find qualified replacement talent despite generous six-figure salaries.

“The developers, who actually know how to maintain legacy code, are rapidly ageing out of the workforce,” said Michael Abbott, senior managing director and global banking lead at Accenture. “We are in a race against the clock to modernise COBOL before the talent pool contracts.”

Neglecting the ongoing maintenance of COBOL systems could be a disaster, given their central role in banking, insurance, healthcare, and government.

“It’s incredibly difficult to even find workers who know COBOL. The language is old and some of the people still fluent in it are even older,” said Brandon Edenfield, managing director of application modernisation with Modern Systems. “This has become a recipe for disaster in states that still operate under COBOL.”

AI to the Rescue

AI-powered tools are emerging as a potential solution to the COBOL crisis and the broader issue of technical debt in legacy systems. “By leveraging machine learning algorithms, tools like SonarQube are developed to manage different types of technical debts,” explained the researchers of a paper.

IBM unveiled its WatsonX AI capability to translate COBOL into Java last summer.

“If you free up that time spent servicing technical debt, testing requirements, debugging, and just basically keeping the lights on, IT can spend more time architecting the product solution that you want while the technology does the programming, testing, and delivery,” said Skyla Loomis, VP of IBM Z Software.

Goldman Sachs has also started piloting an AI assistant capable of writing 40% of its code using generative AI in some cases. Accenture leveraged GPT-4 and a vector database to develop a tool that reverse-engineers legacy COBOL code for modernisation.

However, relying on AI to modernise mission-critical financial systems comes with its own set of challenges. Ensuring the reliability and accuracy of AI-generated code, navigating regulatory and security considerations, and managing the transition from legacy architectures are key hurdles to overcome.

Arun Chandrasekaran, a distinguished vice president at Gartner and generative AI analyst, said, “AI generation is an early-stage technology that takes time to perfect. I’m sure they have checks and balances in place to address this situation, but I prefer to take the ‘wait and see if it works’ approach.”

Yet the potential benefits are significant. AI tools can analyse, document, and refactor legacy code, enabling the transition to modern, scalable, cloud-native approaches.

Thierry Bonfante, chief product officer at Unqork, said, “With a GenAI tool, you can transform the business logic of legacy code into JSON but that process requires humans to ensure that everything is in place. It is a technology-assisted process, not a magic bullet.”

Thomas Dohmke, CEO of GitHub, said, “It is clear that Wall Street’s next crisis could be digital. However, with the help of AI, we stand a chance of avoiding another financial crisis that has been long in the making.”

What’s the Solution?

Addressing the COBOL crisis requires a multi-pronged approach. One solution is to employ AI-assisted tools to identify and manage technical debt in legacy systems. “Static Analysis is another AI technique which can be used to detect technical debts, more specifically code debts,” the researchers explained in the paper.

This technique identifies the COBOL code and addresses the most critical issues. AI-assisted tools are like IBM WatsonX is speeding up this process.

Martin Prescher, CTO of Autonomy, explained, “What is different is that over the last decade the concepts behind AI and ML have been bolstered with unbelievably sophisticated toolsets that make it possible to integrate with existing, and not very sophisticated/old school IT ecosystems.”

Another approach is to invest in modernisation services offered by companies like IBM, Luxoft, and Unisys.

Jeff DeVerter, chief technology evangelist at Rackspace Technology, emphasises its importance: “It could be that the prospect of losing out on AI will motivate organisations to finally get off the sidelines when it comes to modernisation of core systems.”

By leveraging the expertise of these modernisation service providers and the power of AI-assisted tools, organisations can gradually transform their COBOL-based systems into modern, maintainable solutions, enabling them to take full advantage of the latest AI technologies.

The post COBOL’s Costly Legacy Continues to Drain Company Resources appeared first on Analytics India Magazine.

DSC Weekly 16 April 2024

Announcements

In today’s constantly evolving digital landscape, networks are the backbone of modern enterprises. The need to prepare for potential network failures by instilling resilience and redundancy is more pressing than ever. Designing a stable, flexible and secure network infrastructure, with real-time visibility across assets and users is critical to maintaining reliability. Tune into the upcoming Strategies for a Resilient Network summit and discover strategies to design an agile, data-driven network that optimizes visibility, enhances DNS management and minimizes disruptions.
Properly managing data is more essential than ever. Organizations now operate among a complicated web of business applications, ushering in extensive amounts of analytical information that can quickly become unwieldy. Without the right oversight, companies miss out on the chance to make the most of this data to drive smarter decision making, strategic planning and even reduce costs. The growing use of generative AI, machine learning and other emerging technologies are posed to transform data management but how can businesses best leverage these platforms to glean the most business value and risk assessment? In the upcoming Data Management: Navigating Opportunities for Success summit, leading experts in the field will discuss the latest data management strategies as well as what’s next in the data analytics and architecture.

In-Depth

Using window functions for advanced data analysis
April 15, 2024
by Erika Balla
Window functions are an advanced feature of SQL that provides powerful tools for detailed data analysis and manipulation without grouping data into single output rows, which is common in aggregate functions. These functions operate on a set of rows and return a value for each row based on the calculation against the set.
5 mistakes to avoid in CMMC compliance
April 12, 2024
by Erika Balla
Think of a battlefield — not filled with soldiers but cyber warriors. The Defense Industrial Base (DIB) stands as the front line. This digital battleground faces nonstop cyberattacks, each one getting trickier. Here, the Department of Defense uses the Cybersecurity Maturity Model Certification (CMMC) 2.0 program to protect sensitive, unclassified information.
Building reliable and efficient ETL pipelines: Best practices for data wranglers
April 11, 2024
by Ovais Naseem
Data is crucial for your business—it helps with decisions and growth. But sometimes, it’s stuck in different places and hard to use. Implementing an ETL Pipeline is like sharpening that blurry map and fixing the broken compass—it turns frustration into clarity! But there’s good news! An ETL pipeline can help.
The new era of data handling: Tools that transform business strategies
April 10, 2024
by Ovais Naseem
Data Automation Tools play a crucial role in transforming how businesses handle data. They offer advanced functionalities that streamline data management processes, enabling organizations to enhance efficiency and accuracy. By automating tasks such as data entry, validation, and analysis, these tools reduce manual intervention and minimize the risk of errors.
DSC Weekly 9 April 2024
April 9, 2024
by Scott Thompson
Read more of the top articles from the Data Science Central community.

Google Introduces TransformerFAM, For Fixing Amnesia in LLMs

A team of researchers from Google has introduced Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations, fostering the emergence of working memory within the Transformer and allowing it to process indefinitely long sequences.

Click here to check out the research paper.

“In the film ’Memento’ (2000), the protagonist struggles with anterograde amnesia, which means he can not remember anything that happened in the last 10 minutes, but his long-term memory is intact, He has to tattoo important information on his body to remember it. This is similar to the current state of large language models (LLMs),” reads the paper.

Similarly, current state-of-the-art large language models (LLMs) rely on attention mechanisms to extract meaningful representations from homogeneous data, but the quadratic complexity of attention with respect to context length limits the capability of modelling long contexts.

To address these limitations, researchers have explored techniques such as sliding window attention, sparse attention, and linear approximated attention, though these methods have shown effectiveness below the 1B scale.

The introduction of Feedback Attention Memory offers a new approach by adding feedback activations that feed contextual representation back into each block of sliding window attention. This enables integrated attention, block-wise updates, information compression, and global contextual storage.

This innovative approach incorporates a feedback loop, which fosters the development of working memory within the Transformer architecture, allowing it to handle sequences of indefinite length. Notably, TransformerFAM can be seamlessly integrated with pre-trained models and does not require additional weights.

The architecture has been tested across various model sizes, including 1B, 8B, and 24B, and has demonstrated significant improvements in long-context tasks such as NarrativeQA, Scrolls-Qasper, Scrolls-Quality, and XLSum. By effectively compressing and retaining important contextual information within extremely long contexts, TransformerFAM has shown enhanced performance compared to other configurations.

The researchers emphasise the potential of TransformerFAM to empower LLMs to process sequences of unlimited length, which could revolutionise the way they handle long-context tasks and dependencies.

The paper highlights that although traditional Recurrent Neural Networks (RNNs) rely on causal relationships between input sequences, Transformers can efficiently exploit the parallelism of machine learning accelerators.

TransformerFAM’s feedback mechanism, which is limited to the relationship between blocks, does not compromise training efficiency and maintains performance levels similar to other architectures.
Recently, Google researchers also introduced a method for scaling Transformer-based large language models (LLMs) to handle infinitely long inputs with bounded memory and computation.

The post Google Introduces TransformerFAM, For Fixing Amnesia in LLMs appeared first on Analytics India Magazine.

Top 10 Text-to-Music AI Platforms

The Sora moment for music is finally here with new-age apps mushrooming in the space. However, a recent study shows that over 71% of musicians fear AI. A few days ago, more than 200 musicians, including Katy Perry, Billie Eilish, Jon Bon Jovi, the estate of Bob Marley and Frank Sinatra, urged tech companies to stop using AI to create music.

While there exists fear, many artists are embracing AI in their concerts and production. Pop queen Madonna recently used an AI text-to-video tool to create visuals for the giant screens behind her while performing.

Recently, AI-generated a Punjabi-themed Bollywood-inspired song overlaid onto the ‘Kala Chashma’ video.

Let’s look into the Top 10 music generation and text-to-music platforms.

Suno.ai

Suno.ai is a generative artificial intelligence music creation program, engineered to create authentic songs, blending vocals and instrumentation seamlessly. It has been readily accessible since December 20, 2023, following the rollout of a web application and a collaboration with Microsoft. This collaboration led to Suno being integrated as a plugin in Microsoft Copilot.

The founders, Michael Shulman, Georg Kucsko, Martin Camacho, and Keenan Freyberg, all worked for Kensho, an AI startup, before starting their own company in Cambridge, Massachusetts. This innovative platform revolutionises music creation by transforming ideas or prompts into captivating songs.

prompt: a bass 808 cinematic r&b song with vocal chops drowned in too much reverb about stuck in an AI loop #SunoAI pic.twitter.com/NZvG7A3clL

— Allan D Clive (@allan_d_clive) April 10, 2024

Udio

Udio, the groundbreaking AI music generator, hit the public with its free beta launch last week. Conceived in December 2023 by a team of ex-Google DeepMind researchers led by CEO David Ding, Udio quickly garnered attention for its remarkable vocal synthesis capabilities.

Udio’s knack for producing top-notch music from straightforward text prompts has earned it the nickname ‘ChatGPT for music’.

Introducing Udio, an app for music creation and sharing that allows you to generate amazing music in your favorite styles with intuitive and powerful text-prompting.
1/11 pic.twitter.com/al5uYAsU5k

— udio (@udiomusic) April 10, 2024

Udio operates by crafting songs based on textual prompts, spanning a diverse array of genres including barbershop quartet, country, classical, hip hop, German pop, and hard rock, among others.

Boomy

Boomy was introduced by Alex Mitchell and Matthew Cohen Santorelli in 2019. It’s an AI music technology platform that empowers everyone to create and share music and allows users to effortlessly generate original songs with minimal effort.

It offers a range of features that make it easy to create music. Boomy’s AI algorithms are trained to generate music based on user input, allowing them to customise their songs and create something entirely unique.

The platform was introduced to democratise access to powerful AI algorithms and provide an accessible entry point into the world of music production.

MusicLM

Google Music LM, an AI-based text-to-music model, was announced earlier this year. Their paper explained MusicLM, a model generating high-fidelity music from text descriptions, as “a calming violin melody backed by a distorted guitar riff”.

MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modelling task, and generates music at 24 kHz that remains consistent over several minutes.

Further, MusicLM can be conditioned on both text and melody. It can also transform whistled and hummed melodies according to the style described in the text caption.

ChatGPT for music is here.
Google's MusicLM creates mind-bending music.
10 best music created by MusicLM (and to create your own too): pic.twitter.com/2MAaQVuefT

— Shushant Lakhyani (@shushant_l) May 29, 2023

Soundful

Soundful, an AI music platform introduced by CEO Diaa El All, launched the ‘Soundful Collabs’ campaign, blending AI-generated music with the expertise of renowned artists and producers.

The campaign aims to encapsulate the unique musical essence, or “Sonic DNA”, of top-tier figures in the industry, including 3LAU, Autograf, CB Mix, DJ White Shadow, Kaskade, and Starrah. This collaborative effort allows users to create customised tracks resembling those of their favourite artists.

Mubert

Founder and CEO Alexey Kochetkov, armed with degrees in computer science and music education, birthed Mubert from his dual expertise. With this innovative platform, Kochetkov aims to revolutionise the music industry, providing cost-effective solutions for video game streamers, voice assistant developers, and various other sectors.

Mubert AI heralds a groundbreaking era in music creation, harnessing the prowess of AI. Through its AI-driven engine, Mubert crafts royalty-free music tailored for content creators, developers, and artists alike.

Riffusion

Riffusion, a neural network designed by Seth Forsgren and Hayk Martiros, brings a unique approach to music generation. Building upon the foundation of Stable Diffusion, an existing open-source model known for generating images from textual prompts, Riffusion refines this concept by operating on spectrograms.

This innovative adaptation enables the model to interpret text prompts and generate corresponding image files. These images can then undergo an inverse Fourier transform, ultimately producing audio files, described as “de otro mundo” (otherworldly). The music generated by Riffusion captivates listeners with its ethereal quality.

Voicemod

Jaime, Fernando, and Juan Bosch – the brothers launched Voicemod in 2014, introducing a groundbreaking text-to-song AI generator compatible with browsers. Among Voicemod’s innovative features is VoiceLab, allowing users to craft custom voices using a blend of audio controllers and effects like pauses, reverb, vocoder, filter, and pitch shifters.

Voicemod seamlessly integrates with all gaming and communication applications. The platform expanded its AI technology into the realm of music with Voicemod Text to Song. This new feature simplifies the music creation process. Just select your desired track, pick your AI singer, write your text, and you’re done!

MusicGen

Meta’s MusicGen, a powerful single language model, redefines the boundaries of conditional music generation with the ability to create high-quality music by taking cues from text descriptions or melodies.

You can use Hugging Face and create your own space in MusicGen.

Do you want to test the excellent MusicGen model by @_nateraw?
Now you can here; Thanks @huggingface for the Zero GPU and @_nateraw for the excellent model <3https://t.co/ZTGl1qU5Ln

— 𝑨𝒓𝒕𝒊𝒇𝒊𝒄𝒊𝒂𝒍 𝑮𝒖𝒚 (@artificialguybr) April 14, 2024

AudioCipher

AudioCipher is a unique MIDI plugin that allows users to generate melodies and chord progressions from text input. Launched in January 2021, it creates music based on user-specified parameters. The plugin outputs MIDI data directly into the user’s DAW for easy integration with instrument tracks.

Instead of purely randomised sequences, AudioCipher uses a substitution cipher to map text input to a core MIDI sequence. This taps into the inherent connections between language and music, providing a unique way to spark musical ideas and overcome creative blocks.

The post Top 10 Text-to-Music AI Platforms appeared first on Analytics India Magazine.

GovDash aims to help businesses use AI to land government contracts

GovDash aims to help businesses use AI to land government contracts Kyle Wiggers 11 hours

Tim Goltser and Curtis Mason have been building things together since high school, when the two were the co-captains of their school’s robotics team. In college, Goltser and Mason teamed up to create an app — Hang, for scheduling hangouts with friends — with Sean Doherty, who Mason had met while an undergrad at Boston University.

Fast forward to 2022, and Goltser and Mason — along with Doherty — felt the entrepreneurial itch strike again. After considering a few ideas, they decided to go after what they saw as a largely unaddressed market: Tools to help small businesses secure U.S. government contracts.

“The federal contracting community has seen a shrinking of the small business industrial base for much of the past decade,” Doherty told TechCrunch. “It’s hard for these companies to compete against giants like Lockheed Martin or Northrop Grumman. It’s also expensive for them to bid on contracts — if they don’t win, they may run out of cash.”

As a result of labyrinthine systems and mountains of paperwork, finding and bidding for U.S. federal contracts is a laborious process. It takes weeks at a minimum to complete, according to Doherty — and often the best-resourced companies are the most successful.

In a 2023 survey from Setscale, a purchase order financing startup, small business owners cited insufficient cash flow and working capital — and a lack of time and resources — as their top roadblocks to securing government contracts.

To attempt to give these small businesses a boost, Goltser, Mason and Doherty founded GovDash, a platform that provides workflows to support government contract capture, proposal, development and management processes. GovDash was accepted to Y Combinator in 2022; Goltser dropped out of college to help spearhead it.

GovDash is essentially a contract proposal generator. The platform automatically finds contracts possibly relevant to a business, reads through the requests for proposals and — leveraging generative AI — writes proposals

GovDash can trawl through solicitation documents to identify requirements, requested formats, evaluation factors and submission schedules for contracts, Doherty says. It can also identify contracts a business might be qualified for based on their past performance, sending alerts to the inbox of a customer’s choosing, according to Doherty.

“When a contractor wants to respond to a government solicitation, they can run that through GovDash to produce a proposal in a fraction of the time,” Doherty said.

Now, generative AI makes mistakes. It’s a well-established fact. So why should businesses expect GovDash to be any different?

Two reasons, argues Doherty.

One, GovDash built a system that cross-checks a businesses’ info to see just how relevant the business is to a given federal contract. If the relevancy — as judged by the system — isn’t obvious, GovDash prompts the business to template out sections of the contract proposal with more information.

GovDash’s platform tries to automate many of the more tedious aspects of going after — and securing — U.S. federal contracts.

Two, GovDash involves heavy human review. At each stage of the proposal-generating process, the platform checks in with a human reviewer to get their seal of approval.

These steps — cross-checking and human review — aren’t infallible, Doherty admits. But he claims they’re better than what a lot of the competition’s doing.

“Companies now have one place where their business development data flows seamlessly, with an AI agent at its core to automate tedious workflows,” Doherty said. “This is a huge win for the C-suite as they can get out more proposals, at a higher quality level, in a fraction of the time, and put all the associated workflows on autopilot.”

GovDash’s competition is growing — and quickly.

GovDash competes with Govly, whose platform lets companies assess, search and analyze government contracting requirements across disparate sources. A more recent rival, Hazel, aims to use AI to automate government contracting discovery, drafting and compliance. Both — like GovDash — are Y Combinator-backed, interestingly.

But Doherty claims that GovDash is positioned well for expansion.

Having raised $12 million from investors including Northzone and Y Combinator, inclusive of a $10 million Series A funding tranche this month, GovDash plans to grow its engineering team, hire additional federal proposal managers to guide its product efforts and add new capabilities to its existing platform.

New York-based, six-employee GovDash currently works with around 30 federal contractors across the U.S., Doherty said, and is “nearly” cash-flow positive.

“We’re building for the long term for our customer base,” Doherty said. “[We’re] well-capitalized for eventual market tailwinds.”

Retrieval augmented fine-tuning and data integrations

Presentation and discussion with Suman Aluru and Caleb Stevens

In the latest episode of the “AI Think Tank Podcast,” I had the pleasure of hosting a deep dive into the world of AI advancements, specifically focusing on “RAFT” (Retrieval Augmented Fine Tuning). Joining me were the esteemed guests Suman Aluru and Caleb Stevens, who both have much to do with AI infrastructure and application. Our conversation revolved around how RAFT bridges the critical gaps between fine-tuning and retrieval-augmented generation (RAG), and the significant impact this has on AI-driven applications.

We opened the episode by discussing the importance of RAFT in the current AI landscape, where Suman eloquently described its role in enhancing the accuracy of AI responses and reducing the common errors known as “hallucinations.” Caleb complemented this by highlighting the practical deployment of RAFT in IT infrastructures, particularly its effectiveness in managing semantic data where traditional databases might struggle.

A pivotal moment of our discussion was Suman’s live demonstration, which involved querying a model fine-tuned with data from the AI Think Tank podcast’s website. This not only showcased RAFT’s real-world applicability but also demonstrated its power in maintaining the relevancy of AI systems with updated data, eliminating the need for extensive retraining.

Figure-1 Cited from https://arxiv.org/abs/2403.10131 Download full pdf here.

We also delved into the challenges associated with updating AI models post-training. Here, RAFT was discussed as a dynamic solution capable of integrating fresh data seamlessly, thus enabling AI systems to process complex queries with enhanced contextual understanding. The discussion on vector databases and embedding techniques provided a clear insight into the technological strategies that make RAFT a standout choice.

Figure-2 Cited from https://arxiv.org/abs/2403.10131 Download full pdf here.

The episode wrapped up with an engaging Q&A session where our listeners had the opportunity to probe deeper into the applications of RAFT, its advantages over traditional AI training techniques, and its potential transformative impact across various sectors.

Overall, this episode offered a thorough exploration of how advanced techniques like RAFT can significantly bolster the functionality and reliability of AI systems, ensuring they perform domain-specific tasks more effectively and with greater accuracy. The feedback from our community was immensely positive, highlighting the importance and interest in such cutting-edge technologies in the AI space.

As usual, I gained much insight from Suman’s presentations and Caleb’s keen understanding at the code level. We expect to continue this exploration of RAG and RAFT as things develop.

Subscribe to the AI Think Tank Podcast on YouTube.
Would you like to join the show as a live attendee and interact with guests? Contact Us

Tesla Partners with Tata Electronics for Semiconductor Chips

In a significant development, US-based electric vehicle giant Tesla has reportedly signed a strategic agreement with Tata Electronics to procure semiconductor chips for its global operations.

The deal, executed discreetly a few months ago, positions Tata Electronics as a reliable supplier for top-tier global clients looking to establish a crucial segment of their semiconductor value chain in India.

The agreement, whose value and specific terms remain undisclosed, highlights Tesla’s growing interest in India beyond local revenue generation. Tesla, the world’s largest automotive company by market value, is eyeing entry into India, the world’s fastest-growing automotive market.

Tata Electronics’ Expansion and Expertise

Tata Electronics has bolstered its workforce by hiring 50-60 top-level expatriates in recent months, aiming to leverage their expertise in semiconductor technology, strategic planning, and design to enhance its business operations. The company has established semiconductor manufacturing facilities in Hosur (Tamil Nadu), Dholera (Gujarat), and Assam, with plans for further expansion to create a well-integrated supply chain in India. Tata Electronics has invested $14 billion in its business operations thus far.

According to sources familiar with the matter, Tata Electronics has made substantial investments in indigenous technology development across these platforms and has assembled a team with over 1,000 years of combined global domain expertise to lead the project.

Elon Musk’s Upcoming Visit to India

Tesla CEO Elon Musk is scheduled to meet with Prime Minister Narendra Modi during his visit to India this month. Musk is expected to announce potential investments in India, including plans for EV manufacturing facilities. Tesla is anticipated to invest approximately $2-3 billion in India to manufacture electric cars, reflecting the growing interest in EVs within the country’s personal mobility market.

Recent policy adjustments have enabled automakers to import EVs priced above $35,000 at a reduced import duty rate of 15%, subject to committing to invest $500 million within three years to establish manufacturing facilities in India. Tesla is likely to prioritise premium electric models initially, while also considering local manufacturing of entry-level electric vehicles.

Diversifying Supply Chain Post-COVID

Following the COVID-19 pandemic, Tesla has diversified its component sourcing beyond China for critical electronic, electrical, and mechanical parts. Although Tesla keeps its supplier information confidential, it manufactures certain electric components internally, such as electric motors, battery packs, and chargers, while procuring sub-assemblies and other parts from global suppliers.

Ashok Chandak, the president of the India Electronics and Semiconductor Association (IESA), noted that Tesla’s efforts to establish a local supplier ecosystem for electronics and subsystems indicate its commitment to diversifying its supply chain. However, he highlighted the need for improvement in the local semiconductor sourcing ecosystem to support industries like automotive, which require higher value addition in their supply chains.

The Tesla-Tata Electronics deal not only underscores India’s growing importance in the global semiconductor supply chain but also establishes Tata Electronics as a significant player in the Indian semiconductor manufacturing ecosystem.

The post Tesla Partners with Tata Electronics for Semiconductor Chips appeared first on Analytics India Magazine.

AI Con USA: Navigate the Future of AI 2024

A Preview of Keynotes at AI Con USA:

I Got 99 Problems, but AI Ain’t One—Dona Sarkar, Microsoft
AI/ML Adoption Strategies for Enterprises—Hien Luu, DoorDash
The Unseen Engine of AI: How 5 Innovation-Minded Companies Optimized for Operational Efficiency—Nevra Ledwon, DecisionBrain
Operationalizing Disruptive Technologies: A Strategic Framework for Harnessing the Power of GenAI—Mary Thorn, S&P Global Ratings
AI: A Moderated Panel Discussion—Dionny Santiago, Indeed
Humanizing AI—Tariq King, Test IO
Realizing the Potential of AI Tools for Software Development—Matthew Gunter, GitHub
Embrace AI Holistically and Unlock Your Growth Potential—Tania Katan and Rob Nicoletti, HALO Strategies

Join us for a year’s worth of education packed into one amazing week.

Can't join us in-person? A curated, free virtual conference option is also available.

Snowflake Open Sources Arctic, Family of Embedding Models for RAG

Snowflake today announced the launch of the Snowflake Arctic embed family of models under an Apache 2.0 licence. These models, ranging in size and context window, are designed for text embedding tasks and offer SOTA performance for retrieval applications.

The largest model in the family, with 330 million parameters, leads the Massive Text Embedding Benchmark (MTEB) Retrieval Leaderboard, achieving an average retrieval performance surpassing 55.9.

Click here to check out the model on Hugging Face.

Sridhar Ramaswamy, CEO of Snowflake highlights the importance and expertise of the Neeva team and commitment to AI for making the model. Snowflake acquired Neeva in May last year.

The Snowflake Arctic embed models, available on Hugging Face and soon in Snowflake Cortex embed function, provide organisations with advanced retrieval capabilities when integrating proprietary datasets with LLMs for Retrieval Augmented Generation (RAG) or semantic search services.

The success of these models lies in the application of effective web searching techniques to training text embedding models. Improved sampling strategies and competence-aware hard-negative mining have significantly boosted the quality of the models.

Snowflake Arctic embed models come in five sizes, from x-small to large, catering to different organisational needs regarding latency, cost, and retrieval performance.

Snowflake claims that Arctic-embed-l stands out as the leading open-source model suitable for production due to its excellent performance-to-size ratio. Although there are models like SFR-Embedding-Mistral that surpass Arctic-embed-l, they come with a vector dimensionality that is four times greater (1024 vs. 4096) and require over 20 times more parameters (335 million vs. 7.1 billion).

“With the Apache 2 licensed Snowflake Arctic embed family of models, organisations now have one more open alternative to black-box API providers such as Cohere, OpenAI, or Google,” reads Snowflake’s blog.

These enhancements, combined with Snowflake’s data processing power, were achieved without the need for a massive expansion of computing resources, utilising just eight H100 GPUs.

Snowflake plans to continue expanding its range of models and targeted workloads to maintain its commitment to providing customers with top-quality models for enterprise use cases such as RAG and search.

The post Snowflake Open Sources Arctic, Family of Embedding Models for RAG appeared first on Analytics India Magazine.

Utilizing Pandas AI for Data Analysis

Are you proficient in the data field using Python? If so, I bet most of you use Pandas for data manipulation.

If you don’t know, Pandas is an open-source Python package specifically developed for data analysis and manipulation. It’s one of the most-used packages and one you usually learn when starting a data science journey in Python.

So, what is Pandas AI? I guess you are reading this article because you want to know about it.

Well, as you know, we are in a time when Generative AI is everywhere. Imagine if you can perform data analysis on your data using Generative AI; things would be much easier.

This is what Pandas AI brings. With simple prompts, we can quickly analyze and manipulate our dataset without sending our data somewhere.

This article will explore how to utilize Pandas AI for Data Analysis tasks. In the article, we will learn the following:

Pandas AI Setup
Data Exploration with Pandas AI
Data Visualization with Pandas AI
Pandas AI Advanced usage

If you are ready to learn, let’s get into it!

Pandas AI Setup

Pandas AI is a Python package that implements a Large Language Model (LLM) capability into Pandas API. We can use standard Pandas API with Generative AI enhancement that turns Pandas into a conversational tool.

We mainly want to use Pandas AI because of the simple process that the package provides. The package could automatically analyze data using a simple prompt without requiring complex code.

Enough introduction. Let’s get into the hands-on.

First, we need to install the package before anything else.

pip install pandasai

Next, we must set up the LLM we want to use for Pandas AI. There are several options, such as OpenAI GPT and HuggingFace. However, we will use the OpenAI GPT for this tutorial.

Setting the OpenAI model into Pandas AI is straightforward, but you would need the OpenAI API Key. If you don’t have one, you can get on their website.

If everything is ready, let’s set up the Pandas AI LLM using the code below.

from pandasai.llm import OpenAI    llm = OpenAI(api_token="Your OpenAI API Key")

You are now ready to do Data Analysis with Pandas AI.

Data Exploration with Pandas AI

Let’s start with a sample dataset and try the data exploration with Pandas AI. I would use the Titanic data from the Seaborn package in this example.

import seaborn as sns  from pandasai import SmartDataframe    data = sns.load_dataset('titanic')  df = SmartDataframe(data, config = {'llm': llm})

We need to pass them into the Pandas AI Smart Data Frame object to initiate the Pandas AI. After that, we can perform conversational activity on our DataFrame.

Let’s try a simple question.

response = df.chat("""Return the survived class in percentage""")    response

The percentage of passengers who survived is: 38.38%

From the prompt, Pandas AI could come up with the solution and answer our questions.

We can ask Pandas AI questions that provide answers in the DataFrame object. For example, here are several prompts for analyzing the data.

#Data Summary  summary = df.chat("""Can you get me the statistical summary of the dataset""")    #Class percentage  surv_pclass_perc = df.chat("""Return the survived in percentage breakdown by pclass""")    #Missing Data  missing_data_perc = df.chat("""Return the missing data percentage for the columns""")    #Outlier Data  outlier_fare_data = response = df.chat("""Please provide me the data rows that  contains outlier data based on fare column""")

Image by Author

You can see from the image above that the Pandas AI can provide information with the DataFrame object, even if the prompt is quite complex.

However, Pandas AI can’t handle a calculation that is too complex as the packages are limited to the LLM we pass on the SmartDataFrame object. In the future, I am sure that Pandas AI could handle much more detailed analysis as the LLM capability is evolving.

Data Visualization with Pandas AI

Pandas AI is useful for data exploration and can perform data visualization. As long as we specify the prompt, Pandas AI will give the visualization output.

Let’s try a simple example.

response = df.chat('Please provide me the fare data distribution visualization')    response

Image by Author

In the example above, we ask Pandas AI to visualize the distribution of the Fare column. The output is the Bar Chart distribution from the dataset.

Just like Data Exploration, you can perform any kind of data visualization. However, Pandas AI still can’t handle more complex visualization processes.

Here are some other examples of Data Visualization with Pandas AI.

kde_plot = df.chat("""Please plot the kde distribution of age column and separate them with survived column""")    box_plot = df.chat("""Return me the box plot visualization of the age column separated by sex""")    heat_map = df.chat("""Give me heat map plot to visualize the numerical columns correlation""")    count_plot = df.chat("""Visualize the categorical column sex and survived""")

Image by Author

The plot looks nice and neat. You can keep asking the Pandas AI for more details if necessary.

Pandas AI Advances Usage

We can use several in-built APIs from Pandas AI to improve the Pandas AI experience.

Cache clearing

By default, all the prompts and results from the Pandas AI object are stored in the local directory to reduce the processing time and cut the time the Pandas AI needs to call the model.

However, this cache could sometimes make the Pandas AI result irrelevant as they consider the past result. That’s why it’s good practice to clear the cache. You can clear them with the following code.

import pandasai as pai  pai.clear_cache()

You can also turn off the cache at the beginning.

df = SmartDataframe(data, {"enable_cache": False})

In this way, no prompt or result is stored from the beginning.

Custom Head

It’s possible to pass a sample head DataFrame to Pandas AI. It’s helpful if you don’t want to share some private data with the LLM or just want to provide an example to Pandas AI.

To do that, you can use the following code.

from pandasai import SmartDataframe  import pandas as pd    # head df  head_df = data.sample(5)    df = SmartDataframe(data, config={      "custom_head": head_df,      'llm': llm  })

Pandas AI Skills and Agents

Pandas AI allows users to pass an example function and execute it with an Agent decision. For example, the function below combines two different DataFrame, and we pass a sample plot function for the Pandas AI agent to execute.

import pandas as pd  from pandasai import Agent  from pandasai.skills import skill    employees_data = {      "EmployeeID": [1, 2, 3, 4, 5],      "Name": ["John", "Emma", "Liam", "Olivia", "William"],      "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],  }    salaries_data = {      "EmployeeID": [1, 2, 3, 4, 5],      "Salary": [5000, 6000, 4500, 7000, 5500],  }    employees_df = pd.DataFrame(employees_data)  salaries_df = pd.DataFrame(salaries_data)    # Function doc string to give more context to the model for use of this skill  @skill  def plot_salaries(names: list[str], salaries: list[int]):      """      Displays the bar chart  having name on x-axis and salaries on y-axis      Args:          names (list[str]): Employees' names          salaries (list[int]): Salaries      """      # plot bars      import matplotlib.pyplot as plt        plt.bar(names, salaries)      plt.xlabel("Employee Name")      plt.ylabel("Salary")      plt.title("Employee Salaries")      plt.xticks(rotation=45)        # Adding count above for each bar      for i, salary in enumerate(salaries):          plt.text(i, salary + 1000, str(salary), ha='center', va='bottom')      plt.show()      agent = Agent([employees_df, salaries_df], config = {'llm': llm})  agent.add_skills(plot_salaries)    response = agent.chat("Plot the employee salaries against names")

The Agent would decide if they should use the function we assigned to the Pandas AI or not.

Combining Skill and Agent gives you a more controllable result for your DataFrame analysis.

Conclusion

We have learned how easy it is to use Pandas AI to help our data analysis work. Using the power of LLM, we can limit the coding portion of the data analysis works and instead focus on the critical works.

In this article, we have learned how to set up Pandas AI, perform data exploration and visualization with Pandas AI, and advance usage. You can do much more with the package, so visit their documentation to learn further.

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

Рубрика: AI

COBOL’s Costly Legacy Continues to Drain Company Resources

AI to the Rescue

What’s the Solution?

DSC Weekly 16 April 2024

Announcements

Top Stories

In-Depth

Google Introduces TransformerFAM, For Fixing Amnesia in LLMs

Top 10 Text-to-Music AI Platforms

You can use Hugging Face and create your own space in MusicGen.

GovDash aims to help businesses use AI to land government contracts

Retrieval augmented fine-tuning and data integrations

Tesla Partners with Tata Electronics for Semiconductor Chips

Tata Electronics’ Expansion and Expertise

Elon Musk’s Upcoming Visit to India

Diversifying Supply Chain Post-COVID

AI Con USA: Navigate the Future of AI 2024

A Preview of Keynotes at AI Con USA:

More On This Topic

Snowflake Open Sources Arctic, Family of Embedding Models for RAG

Utilizing Pandas AI for Data Analysis

Data Exploration with Pandas AI

Data Visualization with Pandas AI

Pandas AI Advances Usage

Cache clearing

Custom Head

Pandas AI Skills and Agents

More On This Topic