AI Was Meant to Make Data Decisions Easier — Not Harder

Why Trust Techopedia

AI was believed to take data-driven organizations to the next level. However, experts and organizations are realizing that integrating artificial intelligence into data frameworks is trickier than originally thought.

AI not only generates and moves vast amounts of data, but untraceable data biases in training datasets can lead to biased AI models, affecting performance and compliance. Additionally, data experts looking to enhance data quality and secure its integrity are charting into unknown waters thanks to AI.

Techopedia explores when to combine AI with data — and when to leave well alone.

Key Takeaways

  • While AI promises powerful insights, its complex nature strains traditional data-tracking tools. Untraceable biases in training data can also lead to biased AI outputs.
  • Companies must prioritize data lineage, understanding where data comes from and how it flows through the AI supply chain (edge, cloud, devices). Traditional data quality tools may not suffice.
  • Companies such as Intel, Next DLP, and Tessell explain to Techopedia the challenges when mixing AI and data.
  • Organizations must prioritize data quality and security throughout the AI workflow. They should leverage new tools and best practices to navigate this new chapter in the data journey.

Everyone Wants AI, But Many Are Not Sure They Can’t Handle It

The recently released “2024 IT Trends Report, AI: Friend or Foe?” of SolarWinds found that while the majority (88%) of organizations are adopting or plan to adopt AI, few say they have confidence in their organization’s readiness to integrate the tech. The main challenges? Data, infrastructure, and security concerns.

In April, Gartner reported that 69% of organizations are “forced to evolve or rethink their data and analytics (D&A) models” due to the disruptive impact of AI. About 38% of organizations say that a full D&A architecture overhaul will happen in the next year and a half in their company.

AI Data Tracking: From Origin to End of Life

Data-driven companies often treat the tracking of data flows as an afterthought, somewhere between the middle and end of their data priorities. Add AI to this environment and what you get is a real problem.

Advertisements

Bakul Banthia, Co-Founder of Tessell, a cloud database as a service (DBaaS) provider spoke to Techopedia about modern data practices. “It all starts at the root level,” Banthia said.

“Think of data flow similar to energy or water flowing from a central hub to intermediate stations to individual homes.

“As long as there is a mechanism to control who has access to what kind of data and where is the data accessible, then data flow can be tackled.”

But he warned that the task is not easy, describing it as a “surmountable problem” that requires strict discipline.

Responding to the proliferation of data, Banthia said data leaders are the guardians and policymakers of the data assets.

“Breaches happen on secondary environments which are sometimes overlooked from security and policy point of view,” Bartha explained.

“It is the duty of data leaders to define a unified policy for every human and non-human user (applications) that establishes secure and controlled data access.”

John Stringer, Head of Product at Next DLP, a data protection company, also spoke to Techopedia about data lifecycles.

“Understanding the origin, flow, and end-of-life of data is crucial for data leaders committed to safeguarding sensitive information and intellectual property.”

The Nature of AI Shatters Traditional Data Tracking Tech

Because of the way AI models operate, data flows have been significantly transformed. Traditional data quality monitoring tools, designed for simpler, linear data pipelines, are not equipped to keep up with the complex, non-linear nature of AI workflows. This creates blind spots where data quality issues can easily slip through.

AI models often involve intricate data transformations, combining and manipulating data from various sources, often in a ‘black box‘ manner. This also makes it difficult for traditional tools to track the flow and identify quality issues and leads to many risks.

For example, If data quality issues go undetected, they can be amplified by the AI model, leading to biased or inaccurate results.

Whether it be front-facing or back-end AI systems, companies that operate under a data-driven framework are challenged because the AI supply chain is complex. From black box AI models, unmanageable storage locations, and the mix of cloud, edge, and device computing, the supply chain of AI is a complex interplay between machines and infrastructures.

Additional issues contributing to the impact of data tracking is the abundance of data that AI generates, analyzes, and flows through these models — not to mention the speed at which this happens. Techopedia asked Stringer from Next DLP how can data leaders cope.

“Traditional data loss prevention (DLP) solutions are often limited in handling the complexities introduced by generative AI and new AI models, which produce vast amounts of new data types.”

Stringer explained that new and innovative platforms are now enhancing these processes by offering detailed visibility and control from data inception to deletion.

These solutions typically focus on data egress via traditional content inspection and lack visibility into the creation of unstructured data from business applications and GenAI tools. Other more advanced solutions incorporate data origin and track data through its entire lifecycle.

“In addition, data tracking and comprehensive visualization are essential for managing modern data’s complexities, including those generated by AI technologies,” Stringer said.

Real-World Use Case: Integrating Data and AI Hands-On

Bob Rogers, former Chief Data Scientist at Intel, author of the book Demystifying AI for the Enterprise, and CEO of Oii.ai, a data science company specializing in supply chain modeling spoke to Techopedia about his experience integrating data and AI.

Rogers once led a team of data scientists at UCSF. The team helped build the world’s first FDA-cleared AI on an X-ray device. The technology became the GE Critical Care Suite. Today the tech is deployed worldwide.

“The data processing pipelines for organizing and transforming the data used in these models were complex,” Rogers told Techopedia. “In fact, the process of collecting and annotating the data to identify the clinical findings of interest was also very complex.”

The team carefully tracked how all the steps of the data lifecycle connected to each other, with the relevant workflows, and the transformation software linked to each step.

“During model training, when we discovered data challenges, we were able to go back and refine the data collection, annotation, and transformation steps multiple times without having to reinvent the wheel on each iteration.”

“This saved an enormous amount of cost and time and ensured a consistent pipeline that could be repeated for the FDA in the future to support the regulatory submission of the product,” Rogers said.

We asked Rogers why data leaders should care about the origin, flow, and end-of-life of data.

“One of the most critical examples illustrating why data leaders need to care is because of how data is being used in AI.”

When companies integrate AI into their business they are usually looking to model certain things. Rogers said that for example, if Company A was seeking a way to manage a supply chain disruption, they’d have to enter data, such as the location of their inventory, how many units of product they’re moving, and who their suppliers are.

Where the data flows through is important as AI usually leverages cloud computing, which is not immune to cyberattacks. Rogers spoke about the consequences.

“Data leaks have been observed in healthcare settings, where patient Medical Record Numbers and other sensitive data has been exposed.”

Open-source tools exist that can reasonably track critical aspects of the data lifecycle. Despite limitations, these tools can be a good way to start tracking data for an important project or outcome that can demonstrate a return on investment (ROI).

The Bottom Line

The journey to becoming data-driven was already an uphill battle. Companies invested heavily in collecting, cleaning, and analyzing data – the lifeblood of data-driven decisions. But now, the integration of AI throws a wrench into those carefully constructed frameworks.

AI introduces a new set of challenges. Its complex workflows and vast data consumption expose the limitations of traditional data tracking tools.  Untraceable data biases in training datasets can lead to biased AI models, potentially compromising performance and compliance.

The good news is that solutions are emerging. New data tracking platforms offer detailed visibility and control over the entire data lifecycle, from origin to deletion. Additionally, industry experts emphasize the importance of data lineage and security measures throughout the AI supply chain, from edge devices to the cloud.

Advertisements

Related Reading

Related Terms

Advertisements
Ray Fernandez
Senior Technology Journalist
Ray Fernandez
Senior Technology Journalist

Ray is an independent journalist with 15 years of experience, focusing on the intersection of technology with various aspects of life and society. He joined Techopedia in 2023 after publishing in numerous media, including Microsoft, TechRepublic, Moonlock, Hackermoon, VentureBeat, Entrepreneur, and ServerWatch. He holds a degree in Journalism from Oxford Distance Learning, and two specializations from FUNIBER in Environmental Science and Oceanography. When Ray is not working, you can find him making music, playing sports, and traveling with his wife and three kids.