Why Data Quality Checks Fail – Too Many Alerts, Not Enough Ownership
Every morning, your team wakes up to over one hundred data quality alerts. I know, I recall this issue at Facebook. Some of them would be fixable issues others should have just been warnings.
It’s so easy to build data pipelines and add data quality checks these days that I am sure for some people this is reality.
After all, all you need to do is…
Write a few prompts, drag a few boxes and arrows, and check a few boxes, right?
Now you’ve got another 721 data quality checks to wake up to. If you’re using tools like dbt, it’s pretty quick to integrate checks into your workflows.
This is one of the many daily tasks you’ll likely have to deal with as a data engineer. In fact, this article is part of a longer series that I’ve been writing on my newsletter that covers data pipelines from multiple different angles. If you’re interested you should check it out.
But for now, let’s dive into dealing with noisy data quality issues.
Data Quality – Why We’ve Invested In It
Data quality solutions are built on the promise that they will help build data sets you can trust.
They’ll catch bad data before they enter your systems.
And trust me, I’ve spoken with plenty of data teams where the business has lost all trust in their data warehouses and lake houses.
That’s when many data leaders turn to finding either solutions or building. their own data quality tools.
They want to make sure the business trusts the data coming out of their systems.
Which is a reasonable goal. This becomes even more necessary as you
Where It Breaks – Noisy Checks

When I started in the data world, I had to develop my own system for managing data quality.
It was essentially a Python system that used SQL templates where the user could provide a few parameters to test several different generic types of errors.
Want to allow for a certain percentage of nulls on a specific column? Well, just fill in the following parameters:
- column_name: patient_middle_name
- percentage_warning: 95
- percentage_failure: 80
Now you’ve got similar checks with tools like dbt.
Great, so you’ve got your system for data quality!
Problem solved right!
But there is a flipside to data quality checks. They really are great, but it’s easy to go from, let’s build a few checks to, every column now has dozens of checks and every morning you have dozens of notifications that you eventually start to ignore.
Either because there are bigger issues, you’re not rewards for fixing data quality problems, the teams that produce them aren’t interested in fixing them, etc, etc.
So let’s break down the key areas and reasons checks get noisy.
Over-checking Everything
- Every column gets rules – It’s tempting to make sure every column gets checks. But maybe you don’t use every column, or maybe some are just frequently null, and until you plan to fix them at the source, you’re just creating noise.
- Nobody prioritizes what actually matters – Some tables can have hundreds of columns. Now, you probably shouldn’t bring them all in unless you really need them, but if you do, and then add several data quality checks per column, it’s going to get chaotic. That’s why you need to be specific about which columns really matter, as well as what errors will really break things.
Poorly Tuned Thresholds

This is less of a problem with many modern data quality solutions, as they often use some model to detect thresholds dynamically. Now, I’d hope that they’d also be able to detect drift over time. That is to say, maybe you have a slight increase or decrease daily, but it’s just enough so it never triggers anything.
Alerts Lacking Clear Ownership
At the end of the day, if you have data quality checks going off, but no one is set to own the pipeline or the fix, well, guess what, it’s not getting fixed. So someone on the team needs to own them. This could be something that your on-call team member takes on.
Misaligned Incentives
The team generating bad data is rewarded for shipping features, not fixing pipelines.
I think most data teams feel this one.
The application team focuses on building an application that functions. That doesn’t always mean the data is tracked in a useful way. I talked about this in a past article, but maybe they don’t use their updated_at field or simply write over information as someone changes where they live. As long as the application functions, they might be ok with this.
So the bad data will continue to flow…

What Actually Works
If your data quality solution is more traditional, then here are a few simple things you can do to reduce the noise.
- Focus on fewer checks that are on columns that actually matter
- Prioritize business-critical tables
- Tier your alerts (critical vs informational) and make it so they only bubble up under certain conditions
- Assign clear ownership
- Regularly delete useless checks
Now that’s a start, but that’s far from everything.
I am also going to add a caveat here.
I think with AI, some of the points above might have more nuance.
If you can create a system that detects multiple data quality issues, but only surfaces the critical ones or at least knows how to better consolidate and provide the end-user with easy-to-digest information, then that will circumvent some of these challenges(I am sure some tools out there do so). But I will always tell readers and clients to be wary. Many solutions look good in a demo, but fail to live up to their promise.
Final Thoughts
Data quality is important, and you need to ensure that your data pipelines produce reliable data.
But there is a point where data engineers will start ignoring checks, especially if they put effort into trying to fix the issue, and they are:
- Not provided support
- Not rewarded for creating systems that help improve quality, or systems that can help reduce the burden of noisy checks
- Working in systems where bad data is tolerated downstream anyway
- Measured on delivery speed, not data reliability
- Expected to own quality without owning the upstream systems
- Lacking context on whether the issue even matters to the business
- Not involved in defining what “good data” actually means
Pushing a pipeline to production is not the end.
You will have to maintain it.
So don’t expect it to end with the pipeline. You’ll have plenty more work to do once it’s published.
But maybe you’ll have an agent take care of that in the future.
As always, thanks for reading!
If you’d like to read more about data engineering and data science, check out the articles below!
The Data Engineer’s Guide to ETL Alternatives
Does ELT vs. ETL Even Still Matter?
The Data Engineering Job No One Wants To Do – Backfilling
Why Data Pipelines Exist – Beyond Moving Data From Point A To B
What Leading a Data Team Actually Looks Like Right Now
Schema Drift in Snowflake Pipelines and How to Handle It
How To Set-up Your Data Stack For 2026 – Data Infrastructure For AI
