The Most Common Types of Data Pipelines You’ll Actually Build
Whether you’re working at a large enterprise or a small business, there’s almost always a need to extract data from source systems, process it, and use it for operational or analytical purposes.
That process, moving data from point A to point B, transforming it along the way, and making it usable, is what we typically call a data pipeline.
Add a few lines of code, a low-code tool, or an automation platform, and suddenly the term data pipeline starts getting thrown around everywhere.
This might make some data engineers uncomfortable, but if you really think about it, someone exporting data into Excel, applying a few VLOOKUPs, cleaning fields with formulas, and stitching tables together with IF / ELSE logic is, functionally, building a data pipeline.
No, it’s not the same as a production-grade pipeline built with SQL, Python, Airflow, or dbt, but it does solve a similar problem. That problem is turning raw data into something useful.
Useful being a key word!
There are many different types of data pipelines, built for different reasons, operating at different levels of scale and reliability. Some support day-to-day operations, others power analytics and reporting, and some exist purely to move data between systems.
So, in this article I will break down why data pipelines exist, the most common types of data pipelines you’ll encounter, and how to think about them beyond just “moving data from A to B.”
Source Standardization Pipelines

The first data pipelines I helped build and manage were focused on taking data sets from dozens of external partners, in particular insurance providers, and standardizing them to a single data model. This involved getting data via SFTP in different formats, including comma-delimited, tab-delimited, XML, and even positional files, where you had to have a separate file that would define which columns contained which rows.

But this is a problem I’ve run across now many times across many different industries from health care to retail and real estate to name a few.
Not Just Analytics Pipelines
Importantly, source standardization pipelines are often built for operational use cases, not just analytics.
Centralizing and standardizing incoming data can unlock entirely new capabilities, such as:
-
Powering an internal operational system
-
Enabling downstream automation
-
Supporting customer-facing products
-
Creating marketplaces that aggregate inventory or services
For example, if you’re building a marketplace, you may need to centralize inventory data from dozens of suppliers, each sending data in their own format. A standardized data pipeline becomes the foundation that makes the entire product possible.
The Real Challenge – Data Mapping
The hardest part of these pipelines is rarely the ingestion itself, it’s the data mapping and normalization required to handle all the variations in incoming data.
Common challenges include:
-
Standardizing values, such as gender, which might arrive as a number, a single letter, or a full word
-
Normalizing categories, especially in retail, where similar products may use abbreviations or slightly different naming conventions
-
Fixing date and time issues, including inconsistent formats, missing values, and multiple time zones
-
Handling schema drift, where partners change columns or file layouts without notice
You can reduce some of this complexity by asking partners to follow a standard format, but in practice, edge cases are unavoidable. Robust source standardization pipelines are built to absorb that inconsistency without breaking.
Why Standardized Data Pipelines Matter
Once you’ve standardized incoming data, the value compounds quickly.
A single, clean data model allows you to:
-
Build multiple products on top of the same dataset
-
Launch new features faster
-
Create reports or benchmarks across all customers
-
Apply improvements globally instead of one partner at a time
Because the data is already normalized, every downstream system benefits automatically.
The one final point I will add is that this is not just limited to SFTP data sets, I’ve worked with companies that pull in data from APIs as well.
Amalgamation Data Pipelines

Another type of data pipeline I often run across involves amalgamating multiple sources into some sort of single table. This often is used to reflect some sort of funnel or 360 view of a customer. This is different from entity standardization(at least a little bit) because these data pipelines’ goal is to take multiple data sources and piece together a flow.
A common example of this is a sales funnel, where you might be trying to attribute a customer or track information about how long they might have been in different stages of the pipeline.
In that business flow, you might have HubSpot, Google Ads, your own custom solution or application, Salesforce, NetSuite, Stripe, and a few other odds and ends. And you somehow need to track each step.
I recall a similar challenge with mapping out the flow of recruiting data at Facebook. That was a little easier because much of it was through Facebook’s own internal tooling, but we still brought in other data sets that represented various steps and had to deal with issues such as orphan events or individuals that jumped in and were missing certain steps.
The challenge here is having an ID that you can reliably join between your various data sources. Once that is established, then you’ve got to ensure all the sources land at the right time and you’ve set up reliable data quality checks that look for possible duplicate customers and entities, late landing data sets, etc.
Excel “Data Pipelines”

I didn’t want to talk about this pipeline first, because I am sure some of you would be up in arms.
Excel is not a pipeline tool!
And sure, I wouldn’t classify it as a data pipeline tool.
But when we look at how some people functionally use Excel, it’s not that dissimilar compared to how some people might build their data pipelines.
Think about it.
- They extract the data from one or multiple source systems
- Put them into various spread sheets(not too dissimilar from tables)
- From there they often run VLOOKUPs, case statements, sometimes even VBA
- All to create a final output
- This final output might support a dashboard, report, powerpoint
All via Ctrl+c and Ctrl+v

Which is not that different to the pipeline you might be building right now.
Except for the destination is a data warehouse and is hopefully used by a much larger target market inside your company.
So yes, Excel is not exactly a great automated pipeline solution. But many companies rely on it to be their transformation layer and in some cases their…shudders…database.
That is of course until they eventually ask someone to productionize and automate their process…
Enrichment Data Pipelines

Not all the data you need will exist in your core tables. In turn, you might have to enrich your data. This will often need to be a separate pipeline.
It could be an machine learning pipeline that calculates a lead score or feature, or just bringing in another data set from an external source.
In some cases, this could just be a simple join. But there are cases where you need to calculate some sort of lead score or perhaps need to do a heavy amount of preprocessing so you can simplify the table down to something like “Customer ID” and several features you’d like to add in.
These data pipelines are added once your core data model has been solidified, as well as its use cases, or at least, that’s what I’d recommend.
Operational Data Pipelines

Once you’ve built your data warehouse, it’s not uncommon to need to ingest data back into operational systems. A common example I have seen here is segmentation in Salesforce. Many companies I’ve worked with export their data out, enrich it, then ingest it back into Salesforce.
Some other examples include ingesting data into Netsuite, Hubspot, and a dozen other platforms.
The difference between this and other pipelines is the fact that you have to interact with different systems for ingestion. Some tools allow you to create a CSV to upload data, others require API interactions, etc.
Meaning this isn’t as simple as loading into a database. You often need to work around the other tools limitations.
These pipelines often straddle the line between a software and data project. Not all the systems you’re looking to re-integrate the data back into allow for batch data loads. Instead, you have to update a single record at a time and run verifications after to ensure you don’t blow up your ERP or CRM.
Before concluding this newsletter, I’ll add that these are just a few examples of the types of data pipelines you’ll likely work on. Here are a few more.
- Machine learning pipelines
- Integration pipelines
- Migration Pipelines
- Metadata & Lineage Pipelines
- What else did I miss?
Final Thoughts
When data teams say data pipelines, it really can mean so many different things. Maybe it’s someone’s semi-automated VBA script that requires a few different people to process each new data set. Maybe it’s a fully automated system that processes a hundred different versions of patients and claims data, all to model them into a single final dataset.
But they often have very similar goals. To take data from their source system and make it usable for a different task. May it be a data product, operational workflow, or machine learning model.
Now, despite all of these differences, I will say that these pipelines still require some similar steps. So keep an eye out for the next article.
“What every data pipeline needs” or something like that.
Thanks for reading!
If you’d like to read more about data engineering and data science, check out the articles below!
Why Is Data Modeling So Challenging – How To Data Model For Analytics
What Is A Data Platform And Why You Should Build One
Throughput vs Latency: Understanding the Key Difference in Data Engineering
How to build a data pipeline using Delta Lake
Intro To Databricks – What Is Databricks
Data Engineering Vs Machine Learning Pipelines
