Stop Boiling the Ocean

Every AI vendor tells you that you need to get your data house in order before you can use AI. Meanwhile, every AI enthusiast tells you to just throw your data at the model and let AI figure it out.

They're both wrong.

I've watched both failure modes play out dozens of times. The companies that try to boil the ocean, fixing all their data before touching AI, never finish. The companies that fling garbage at models hoping diamonds will emerge, well, that's neither how garbage nor diamonds work.

The answer is neither. And it's not a compromise between two bad ideas. It's a different approach entirely. After 20 years at the C-level as a CISO, CIO, and CTO, I've learned that the gap between what practitioners actually live with and what the industry tells them to do is exactly where value lives.

The inside perspective

When I was in the chair, I watched data quality initiatives fail in two completely different ways.

The first failure mode was paralysis. Someone would present the vision: unified data governance, consistent quality standards, enterprise-wide cleansing. Two years and millions of dollars later, we'd have a fraction of the scope complete. The business had lost patience. AI was still "coming soon."

The second failure mode was chaos. Someone would declare that AI could handle messy data, just throw everything at the model and let machine learning sort it out. We'd dump terabytes into a pipeline, hit run, and watch the hijinks ensue. Hallucinations. Contradictions. Confident wrong answers. Results so unreliable that no one would trust the output.

Both failures had the same root cause: disconnection from business reality. I watched data quality fail two ways. Paralysis from trying to fix everything, and chaos from fixing nothing. Both had the same root cause, a disconnection from business reality.

The initiatives that actually worked were different. They started with a business question, fixed the data needed to answer that question, learned something in the process, and fed that learning back to inform broader decisions.

The outside observation

Now I watch the same two failure modes play out at scale across the industry.

On one side: massive data governance programs that become permanent overhead. Armies of consultants building frameworks. Data quality scores improving while AI initiatives stall. The perfect becoming the enemy of the good.

On the other side: the "we'll do it live" crowd. Enterprises dumping raw data into AI pipelines with no quality consideration whatsoever. Treating hallucinations as a model problem instead of a data problem. Wondering why their AI produces garbage while insisting the technology is at fault.

The analysts covering this space tend to pick a side. Some advocate for comprehensive data transformation. Others hype AI's ability to find signal in noise. Neither approach works in practice. The organizations actually getting value from AI are doing something different, something that neither the governance vendors nor the AI enthusiasts want to talk about.

The uncomfortable truth

Both failure modes stem from the same misconception: that data quality is a technical problem with a technical solution.

It's not. Data quality is a business problem that requires business judgment about where to invest.

Failure mode one, boil the ocean. The enterprise commits to comprehensive data transformation before AI deployment. Years pass. Budgets drain. The business moves on. By the time the data is "ready," the opportunity has passed and the technology has changed. The fundamental error is treating data quality as a destination instead of a continuous process.

Failure mode two, fling garbage. The enterprise dumps raw data into AI pipelines, hoping the model will find the signal in the noise. Outputs are inconsistent, unreliable, and often wrong. Users lose trust. The AI initiative is labeled a failure. The fundamental error is treating AI as magic that can compensate for any input quality.

Both modes share the same problem. They disconnect data quality decisions from business value. One over-invests without feedback. The other under-invests without consequences.

The garbage-in reality

Let me be specific about what happens when you fling garbage at AI.

LLMs work by finding the mean in noise. Feed them enough data, and patterns emerge. This works for general knowledge because the corpus is massive and errors average out.

It does not work for enterprise data.

Your enterprise data isn't a massive corpus. It's a specific, limited dataset with specific, concentrated problems. When you feed bad data to AI, you don't get "mostly right with some noise." You get confidently wrong answers that look exactly like right answers.

Duplicate customer records don't average out. They double-count revenue and make every analysis wrong.
Inconsistent date formats don't self-correct. They create phantom patterns and temporal hallucinations.
Contradictory source systems don't resolve themselves. AI picks one arbitrarily and presents it as fact.
Missing data doesn't fill itself in. AI either ignores it, skewing results, or invents it, hallucinating facts.

The "let AI figure it out" crowd acts like machine learning is magic. It's not. It's math. And math applied to garbage produces garbage, just faster and with more confidence.

The third path: responsive and informative

The approach that actually works is neither extreme. It's responsive to business needs, and it generates intelligence for enterprise decisions.

Here's the principle. Respond to the business quickly. Fix data quality where it matters for this need. Learn what you find. Feed that learning back to the enterprise to inform investment decisions. This isn't a compromise between two bad ideas. It's a different operating model.

Business need emerges. A specific question or use case surfaces with real business value. Not theoretical value. Real value that someone will pay for or act on.
Assess required data. What data sources does this need require? What's their current quality state? What specific issues would prevent AI from delivering reliable results?
Targeted quality fix. Fix what matters for this use case. Nothing more. Be ruthless about scope. The goal is enabling this specific outcome, not perfecting the data estate.
Deploy and validate. Get AI working on the fixed data. Measure actual outcomes. Is it reliable? Is it trusted? Is it delivering the business value you expected?
Capture and report. Here's where most approaches fail. Document what you found. What quality issues existed? What did it cost to fix them? What would broader fixes across similar data enable? What's the estimated return of expanding this work?

Feed that intelligence back to the enterprise. Let business leaders decide where to invest based on demonstrated value and identified opportunities, not based on abstract governance frameworks or wishful thinking about AI magic.

The intelligence feedback loop

Every focused data quality effort should make the enterprise smarter about its data.

What you capture from every focused effort: the quality issues discovered, the cost to remediate for this use case, the estimated cost for broader remediation, the business value the fix enabled, the adjacent use cases this data supports, and the systemic patterns you identified.

What the enterprise gains: prioritized investment opportunities based on demonstrated value, business cases for targeted fixes backed by real numbers, an understanding of data debt and its actual cost, a risk assessment of leaving data unfixed, and a roadmap informed by evidence rather than governance theory.

This is how you build organizational intelligence about data quality. Not through comprehensive assessments that never end. Not through ignoring the problem and hoping AI compensates. Through systematic learning from targeted work.

Signs your approach is failing

You're boiling the ocean if you've gone 12 or more months without enabling a single AI use case, your data quality scores are improving while AI initiatives stall, you have more governance overhead than actual data cleaning, your business stakeholders stopped asking about AI timelines, or your team talks about "phase one" and "foundation" constantly.

You're flinging garbage if your AI outputs are inconsistent or unreliable, your users don't trust AI recommendations, hallucinations get blamed on "model limitations," no one can explain why AI produces wrong answers, or data quality gets dismissed as someone else's problem.

You're on the right path if AI value is delivered within 90 days of a business need, each project generates enterprise intelligence about data, business priorities drive data quality investment, investment decisions are based on demonstrated return, and cumulative learning informs broader strategy.

The trust equation

The real goal isn't data quality. It's not AI deployment. It's trust.

How do you get your organization to trust AI enough to act on it? You don't get there by promising comprehensive data transformation that takes years. You don't get there by deploying AI on garbage and hoping no one notices the errors. You get there by demonstrating reliability. One use case at a time. With honest feedback about what it took and what it enabled.

Trust builds through demonstrated reliability. One reliable AI use case teaches more than a hundred governance documents. When people see AI work correctly, consistently, on something that matters to them, they start to believe.

Trust builds through honest accounting. When you tell stakeholders "this AI works because we fixed these three data issues, and here's what it cost," you build credibility. They understand the relationship between investment and outcome. They can make informed decisions about the next investment.

Trust builds through cumulative learning. Each project adds to enterprise intelligence. Over time, the organization develops a realistic understanding of its data quality challenges and the investment required to address them. That understanding enables strategic decisions instead of reactive firefighting.

What I'd tell my former self

If I had known then what I know now:

I would reject both extremes immediately. Neither "fix everything first" nor "let AI figure it out" has ever worked. Stop pretending they will.

I would require every data quality effort to generate enterprise intelligence. Not just clean data, but insight into the data estate that informs future decisions.

I would measure success by trust built, not data cleaned. Does the business trust AI more after this project? That's the only metric that matters.

I would create a feedback mechanism from day one. Every project reports what was wrong, what it cost to fix, what it enabled, and what broader fixes would enable. The enterprise uses this to prioritize.

I would stop treating data quality as IT's problem. It's a business investment decision. IT provides the intelligence. The business decides where to invest.

The bottom line

Don't boil the ocean. It never finishes. Don't fling garbage. It destroys trust. Respond quickly. Learn constantly. Feed intelligence back to the enterprise. Let the business decide where to invest based on evidence, not theory.