In AI We Trust?

July 11, 2024
Read Time: 6 minutes
AI & Machine Learning

[An update to content originally published on December 11, 2023]

How gen AI increases the importance of data quality and governance

Generative AI is a class of artificial intelligence models and algorithms engineered to generate text, images, and other media similar to a training dataset but also has its own set of unique qualities. The tools learn the patterns and structure of the data they’re fed in training to generate new data with similar but unique characteristics.

Gen AI landed with one of the most popular technology products in history when ChatGPT launched in November 2022. The tool’s self-service features and accessibility made it an immediate success and estimates of productivity gains as measured by GDP are staggering. Rough calculations generated in 2023 suggested gen AI boosts employee productivity by 40-60%1,2. Research also projected that global productivity could increase by 7%3 or US$3-4 trillion4.

While some are skeptical regarding true productivity due to the high cost of large language models (LLMs), the industry is already trending to more specialized, less costly models, reducing the doubt that the technology can add value and is not just a flash in the pan. Highlighting the value of data, BlackRock’s $3 billion acquisition of Preqin comes in the broader context of increased demand for unique data sources across industries.

Still, trust remains a concern as customers deal with privacy, intellectual property, and even falsehoods, which are described as hallucinations. Users must weigh using third-party tools and consider the implications of what happens when their data is fed into third-party systems that then use the information as part of the product. As firms evaluate whether to build or buy their own gen AI tools, they must think through the costs, risks, and complexities of training models on data in a world where personally identifiable information and individual rights to privacy are critical to consider.

In a 2023 survey from Canva, most of the 4,000+ respondents said they have a baseline level of trust in gen AI. Yet, only one-third agreed that they completely trust the technology. Their top three concerns? Customer, company, and personal data privacy.5 When using third-party gen AI, it’s critical to recognize the data one firm inputs into a model may be output for other firms. When building LLMs, firms need to recognize that if training data is later determined to be private, confidential, or otherwise not eligible, the model must be retrained. That’s an expensive undertaking.

So if many users struggle to trust AI tools, how can firms building LLMs and AI models improve their processes?

YOU MAY ALSO ENJOY: Supercharging the Power of Your Data

The foundation

Data quality refers to data accuracy, completeness, consistency, and reliability. Think of it as the foundation for all data-driven applications and AI systems. When the foundation is weak, the entire structure is at risk. Gen AI models need high-quality data to produce coherent and useful output. Data that contains errors or inconsistencies will negatively impact the tool’s output – potentially leading to incorrect or even harmful information.

Data quality is also vital in applications that require predictive modeling that relies on historical data. If a machine uses low-quality data, its predictions and recommendations may be inaccurate and lead to subpar decisions.

Another element of data quality is freshness. Knowing when source and training data was last updated helps ensure models reflect the latest information and can set expectations about model behavior. For example, many LLMs are not continuously learning with the latest data. The knowledge cutoff date for users of the paid Chat GPT-4 Turbo subscription is December 2023.6 A lot has changed when we think of what’s happened in the world over the past seven months. The free version of ChatGPT is trained on even older data from January 2022.

The framework

If we consider data quality to be the foundation of AI, data governance is the framework that supports the technology. Proper governance ensures AI uses data securely and in compliance with regulations.

Unique data is essential to creating differentiated outputs from your competitors. However, AI and ML tooling can miss critical components. Data governance and data discoverability are critical parts of the AI models. It would not be sufficient to have a “good enough” model. To succeed, enterprises must integrate data governance, catalog, and lineage into the lifecycle of their gen AI efforts.

Up until this point in time, the science behind AI has often been ad-hoc in nature without any oversight. It was only recently that the Executive Office of the President of the United States announced its long-awaited AI safeguards. Focused on safety and security mandates, equity and civil rights guidance, and research about AI’s impact on the labor market, the executive order will address privacy, security, and transparency.

Governance ensures only the right people and programs access the data, and lineage ensures credible, easy ways to track the models’ sources to identify and mitigate unusable data.

RELATED READING: The Importance of Data Lineage

Putting it all together

A purpose-built tool that integrates data quality, governance, and lineage into its design can offer a competitive advantage by giving firms the confidence they’re building gen AI models from a solid foundation.

The importance of this integrated approach cannot be overstated. As demand for gen AI grows, concerns about potential misuse also arise. The risks of cybercrime, fake news, and deepfakes are real and can have significant consequences. By meshing the foundation of data quality with the framework of data governance, firms enhance confidence in their tool’s output. Built-in governance and lineage protect sensitive information and help maintain trust with clients, employees, and key stakeholders.

When data is a valuable asset, having the right tools and frameworks in place becomes essential. In a landscape where gen AI has the potential to revolutionize industries, it is crucial to prioritize both data quality and data governance. By doing so, firms can harness the power of gen AI while mitigating the risks associated with its misuse. The right foundation and framework can enable firms to confidently navigate the world of gen AI, opening up new opportunities and driving innovation.


1 AI Improves Employee Productivity by 66%, July 16, 2023, NN Group.

2 How Generative AI Can Boost Highly Skilled Workers’ Productivity, October 19, 2023, MIT Management Sloan School.

3 Boost Your Productivity with Generative AI, July 27, 2023, Harvard Business Review.

4 Economic Potential of Generative AI, June 14, 2023, McKinsey Digital.

5 AI Is Supercharging Work and Creativity, November 2023, Canva and Morning Consult.

6 Models, OpenAI Platform

Greg MueckeVice President, Product Management

Greg is Vice President of Product at Arcesium where he is responsible for AquataTM, the data platform purpose-built for the investment industry. Greg has spent the last decade building, launching, and managing technology products across the investment lifecycle.

Share This Post

Subscribe Today

No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.