If AI is the ‘gas guzzler’ of data, how do we get better mileage?

fueling-gettyimages-130833342

Can we tame the glut of inadequate or questionable data moving through artificial intelligence systems? AI is hampered by hallucinations, bias, polluted training data, and — ultimately — organizational uncertainty. Industry leaders and thinkers have some ideas for getting data in order.

If data is the new oil, then AI, "which needs lots and lots of it, is the 'gas guzzler' of data," Andy Thurai, principal analyst with Constellation Research, told ZDNET. However, consuming large volumes of data risks a loss of quality in the process — creating trust issues with AI.

Also: From AI trainers to ethicists: AI may obsolete some jobs but generate new ones

A survey of 6,000 employees by Salesforce finds that three-quarters don't trust the data that trains the AI they work with. Another recent survey of 550 executives with large organizations by Fivetran estimates that organizations lose on average 6 percent of their annual revenues, or $406 million, due to underperforming AI models (that are built using inaccurate or low-quality data), resulting in misinformed business decisions. Organizations leveraging large language models (LLMs) report data inaccuracies and hallucinations 50% of the time.

Also, fixing these deficiencies requires data curation and quality assurance, which eats up a lot of time for people who should be focusing on business problems. "Most data scientists spend time curating or wrangling data vs. creating and testing actual models," Thurai added.

Yet a lot of data is still needed to fuel the AI engine. The challenge is that "when you feed AI and ML models with partial data, you only get a partial view of the enterprise," Thurai explained. "Though enterprises are producing more than enough data, it's still very fragmented between business units, domains, platforms, and implementations such as cloud versus private data centers."

The problem is organizations are charging head-first into AI. "Many businesses are overly eager to throw technologies at the loudest problem that exists without putting in the hard work, such as addressing underlying data quality issues," Michael Heath, lead technical solutions engineer at SHI International, told ZDNET. "This demands accurate, consistent, and complete data. Without robust data governance and data management practices, organizations risk amplifying errors and generating unreliable insights."

Data governance calls for an all-hands-on-deck effort to ensure that the right data is going to the right people and applications, and that data is timely, relevant, secure, and has value.

While data quality has been top of mind for years, identifying data that is essential for AI and training models is another challenge. This "quintessential data" — as defined by Neda Nia, chief product officer for Stibo Systems — consists of data "that is well governed and truly represents what delivers the most optimal result to train machine learning models," she told ZDNET.

Also: Do AI tools make it easier to start a new business? 5 factors to consider

Quality matters — and concerted governance is needed at both the data and AI levels. This creates "the transformative force reshaping data management and delivery in the GenAI era," said Junaid Saiyed, chief technology officer of Alation. "The rapid pace, vast scale, and intricate complexity of data processing in GenAI demands robust AI governance frameworks. Organizations can overcome the garbage in, garbage out dilemma with effective AI governance."

Of course, high-quality data doesn't appear out of nowhere. "The main challenge in maintaining high-quality data lies in the unpredictable nature of requirements," said Nia. "Questions include 'What constitutes AI-ready data?' 'Which future models will need specific data?' and 'How far back should data be retained for optimal processing in models?'"
People working with AI need to consider "the established requirements set by compliance and regulation, while also anticipating future data science needs, including those yet to be defined," Nia elaborated. "This poses a significant challenge. How can we anticipate future requirements in a constantly changing environment?"
Also: Can governments turn AI safety talk into action?

Well-governed, quality data needs to be ready and available for all scenarios, she continued. "Invest and focus on such data. While data volume is important, quality outweighs volume in the modern world."

AI and data governance "ensures that AI models operate on clean, relevant, and reliable data," said Saiyed. "This enhances the accuracy and fairness of AI decisions, promotes effective collaboration through metadata management, and ensures compliance with increasing regulatory demands."

Data governance also helps "establish a culture of data integrity, so organizations can drive innovation, operational efficiency, and growth," Saiyed said.

Artificial Intelligence

Follow us on Twitter, Facebook
3 1 vote
Article Rating
Subscribe
Notify of
guest
0 comments
Inline Feedbacks
View all comments

Latest stories

You might also like...