Why Are We Still Struggling To Answer How Many Active Customers We Have?

“Why do we still struggle to answer basic questions like how many customers are still active or what exactly is the company’s churn?”

This was a question posed to a panel I was sitting on at the Snowflake Summit. Several panelists provided different answers and all come from very different perspectives. Truth be told I have mulled over this question for the last few years as I worked on projects that involved reconciling basic numbers like several active customers or total sales for a company.

Metrics most would assume would be easy to calculate but continue to plague companies even after they spent large amounts of their budget trying to build robust data infrastructure.

Rarely is this due to one cause. Over the years I have seen a multitude of reasons why companies struggle to answer basic questions. These reasons range from constant turnover of developers, ERP and CRM migrations, producers of data constantly changing what data they provide, and mergers and acquisitions(of course there are still more reasons). All in all, trying to continually report even basic numbers can prove to be very challenging when the underlying components are constantly shifting, but let’s discuss some of these issues and how we can try to mitigate them.

Developers Want To Develop

When you hire a developer, they generally want to develop. Meaning if you provide them a role where they are stuck maintaining old code or using a pre-built solution they will become frustrated.

I know, I have been there. Developers want to build new things and every problem is an opportunity to do so.

Taking away the opportunity for developers to build is taking away what they find joy in doing. This is great when there are complex problems to solve. Your developers can build a solution that helps manage said complex workflow.

However, then the team or individual leaves and no one knows or wants to maintain the system they build. So the cycle starts again. A developer joins the company, decides to build new infrastructure, perhaps they get a few metrics down, and then leaves 18-24 months later.

Leading to a repetitive cycle of constantly developing and redeveloping the same infrastructure over and over again. This pattern generally occurs in smaller and medium-sized companies where data teams are small leading to a lack of process ever fully being developed.

One solution is to provide a clear framework that future engineers need to adhere to. Although every engineer has their preference of tools and best practices, it would be a good idea for a company to set standards and provide a framework that future engineers, when hired, can fit into.

Otherwise the cycle will continue.

Software Engineers Don’t Like To Be Slowed Down

If you’ve worked as a data engineer in a product heavy company you have probably had to constantly keep track of your application source tables very closely because software engineers move fast and care about functionality first and data second..or maybe third.

This often leads to a “throwing over the fence”(Chad Sanderson) mentality from the producers.

In turn this causes several problems.

Data engineers have to produce logic that is very brittle into their pipelines in order to manage any issues or limitations the data has.
Data engineers need to create systems or manually track tables for any changes that will break their pipelines daily.

One solution for this problem has been supported by several data experts is the concept of a data contract.

A data contract is a written agreement between the owner of a source system and the team ingesting data from that system for use in a data pipeline. The contract should state what data is being extracted, via what method (full, incremental), how often, as well as who (person, team) are the contacts for both the source system and the ingestion.“

— Data Pipelines Pocket Reference, By James Densmore (O’REILLY 2021)

On one side this is great, it provides better collaboration and opportunities for transparency.

I can understand why both the software and data engineering teams will feel like this slows down their work.

However, I believe this is a happy middle-ground. Where instead of involving an extra person to try to manage all of the changes, data engineers and software engineers can manage the changes themselves through a clear process.

Migrating ERPs And CRMs

Companies are constantly switching out their underlying CRMs, ERPs, and their data infrastructure(such as the data pipelines and data warehouses). This is driven by the fact that solutions get sunset and companies are trying to improve performance or reduce costs.

All of which have the same result. The need to migrate data pipelines to a new source. Of course, it’s not that simple. The new source will also have a different underlying data model and will likely require new business logic to translate it.

Constantly switching out the underlying infrastructure doesn’t just slow down current initiatives for a data team, it generally takes them back a few steps.

Some tools can reduce the amount of code/work required. Such as either open-source or no-code data connectors that remove the need to recreate the extraction process every time.

There are other steps companies can take to reduce the impact of migrating data sources such as including the data team in conversations for the migration project to ensure that the new ERP or CRM still provides the same level of data coverage.

In the end, migrating data sources can be challenging mitigate(even if you’re data model is well designed). Guaranteeing that the new data source has all the right fields and provides the ability to create all the same business logic is never going to be 100%.

Mergers And Acquisitions

One possible project I was going to take on involved a company that had 6-7 other companies recently merged into one. All running different ERP and finance solutions. Now all of the current ETL, data pipelines, and data warehouses that these various companies have developed need to be centralized.

In the long term, the company will centralize all of its business processes into a single set of solutions. However, until then, the company still needs to report on all that is occurring in its multiple lines of business. Thus, there is a moment of chaos. A moment when a data team will have to force all of the various sources into some congruent reporting layer.

Overall, when companies merge it is best to make sure there is a lot of communication between all the various data teams. The sooner the better. Discussions on what to do with all the various data sources and how to combine their need to happen quickly to reduce friction and distrust.

Lack Of Metrics Definition Process

The problem with defining metrics like churn is it can mean something different depending on the business you’re in or even how the question itself is formed. Perhaps a customer makes a large purchase once every 6 months where another customer makes a purchase every month. How will you define churn?

If you leave the process of metrics defining ambiguous, you will receive ambiguous results.

When defining metrics it is important to make sure you define the goal of the metric, have clear buy-in from stakeholder who will care about said metric and document in english what the logic is and what it represents.

All of this can seem like unnecessary steps, but having even a baseline process ensures that how metrics and created and managed is standardized.

Will We Ever Be To Answer Basic Questions?

Our struggle to answer basic questions in business is driven by many different factors. Many of which have less to do with the size of data and more to do with all the components that change over time. Whether that be employees quitting, ERPs being swapped out or another migration project to a new data warehouse solution.

All of these constantly moving pieces force most data teams in a 2 steps forward 1.75 steps back approach to their work. There might be progress, but its very slow and can easily reversed to a state where even answering basic questions becomes challenging again.

It might seem like this post was filled with lamentations. But honestly, I have found in the last few years teams are capable of getting back to a state where they can answer basic questions faster and faster as tooling and collaboration improve.

How have you ensured that your data team keeps moving forward?

If you enjoyed this article, then check out these videos and articles below.

What Is Trino And Why Is It Great At Processing Big Data

Data Engineer Vs Analytics Engineer Vs Analyst

Why Migrate To The Modern Data Stack And Where To Start

5 Great Data Engineering Tools For 2021 — My Favorite Data Engineering Tools

4 SQL Tips For Data Scientists

What Are The Benefits Of Cloud Data Warehousing And Why You Should Migrate