12 Videos To Watch Before Setting Up Your Data Stack Or Paying For Snowflake

When you demo a new data tool, everything works perfectly. Account executives are great at making it sound like their product is the perfect product.
They make sure their data pipelines run smoothly.
The demo dashboards that load in seconds.
And of course, the graphs and charts they put together look pretty impressive.
So, you buy it, integrate it into your data infrastructure, and hope for the best.
But then you realize your data stack isn’t like the demo. It has legacy systems, messy data, no clear ownership, and pipelines so complicated that no one wants to touch them.
By that point, it’s too late to walk away. I see a lot of data and business leaders have to just accept it. After all, you’ve just signed a 6, 7, or maybe 8-figure contract.
Now you have to own it.
This happens time and time again. Most teams buy tools before they understand the system they’re trying to build.
Before you buy your next warehouse, ETL, or another data tool, just do a little homework first. Here are 12 videos I’ve filmed over the years that will teach you how to set up a data stack and make better decisions overall.
Understand Where Your Data Is Going – What Is A Database?
This was one of the earlier videos I put out.
In this video, I go over three of the main data storage systems data teams will likely interact with, databases, warehouses, and lakes. I filmed it after a client had issues with MongoDB (a NoSQL database), which wasn’t well-suited to the type of query patterns they needed. I suggested they look into building a data warehouse because they were running analytical queries via Mongo’s query language..but they weren’t sure what that was. So I thought I’d break everything down.
This video will help if you’re trying to decide between systems and which ones will benefit your business.
Why Data Pipelines Exist – Beyond Moving Data From Point A to B
If you’re a data engineer, you probably spend a lot of your time building data pipelines and making sure they run smoothly. But have you ever thought about why they exist in the first place?
Or what they really are trying to do in the context of the business itself?
This video explains why pipelines are much more than just moving data from one location to another. They’re about reliability, repeatability, and making all your data actually usable. Check it out.
Snowflake vs. Databricks – A Race To Build THE Cloud Data Platform
Many vendors aren’t selling just one data tool anymore. Instead, they want to be your one-stop shop for all things data. Snowflake and Databricks are probably the clearest examples of this. What started as a warehouse vs. lakehouse conversation has slowly turned into a broader platform war. Both companies want to own more of your stack: storage, compute, governance, AI, orchestration, sharing, apps, and more.
I explain more about the platform war that’s going on, which can be useful if you’re learning how to set up a data stack and which tools to include.
Look at the Whole Stack – How To Set Up Your Data Stack
Don’t judge data tools in isolation. Instead, consider your entire data stack and how everything fits together. This 2022 video explains the components of a good stack, including ingestion, storage, transformation, governance, and quality.
You’ll also learn the difference between a modern data stack and a legacy data stack, and how the former offers endless scalability without having to deal with annoying downtime.
What Is DBT and Why Is It so Popular – Intro To Data Infrastructure Part 3
A data build tool, or DBT, lets analysts, engineers, and analytics engineers perform the Transform step in the ELT pipeline. In my opinion, it’s more than just a tool. It marks a real shift in how teams manage SQL, testing, documentation, and more.
So, how do you use a DBT? And why did this technology become so popular? This video tells you everything you need to know.
What Is Snowflake – Breaking Down What Snowflake Is, How Snowflake Credits Work, and More
Snowflake is one of the most popular data warehouses on the market, and it always has been. However, that doesn’t mean you should buy it just because everyone else does. Before parting with your cash, you’ll want to understand the key components, such as cost, storage, compute, and workloads.
I made a video last year that helps you decide whether Snowflake is right for you. I cover all the basics, then explain how Snowflake credits work, its features, and lots of other stuff.
Common Data Pipeline Patterns You’ll See in the Real World – Types of Data Pipelines You’ll Build
People like you have various reasons for building data pipelines. So I decided to break down the types you might come across, such as batch and streaming. I also cover common data flow patterns, like reverse ETL, what they’re good for, and so on.
My point in this video is that data pipelines make much more sense when you understand what’s going on behind them.
What to Consider When Building Data Pipelines – Intro to Data Infrastructure Part 2
My advice? Always ask questions about things like SLAs, ownership, costs, and potential failures before you choose a tool and start building pipelines. Otherwise, you won’t be able to get data to do what you want it to do. This video goes into depth about what to consider when designing data workflows.
The Realities of Airflow – The Mistakes New Data Engineers Make Using Apache Airflow
Airflow remains a popular open-source orchestration tool that helps you schedule and manage data workflows. But is it the right fit for your business? Sometimes tools like this don’t remove complexity from your data stack. They add it.
I’ve seen various ways companies have deployed Airflow in the past, and many face similar challenges. With that in mind, I compiled a list of the mistakes engineers make when using it and how to avoid them.
Building Your Own Data Pipeline Tool From Scratch – Should You?
Companies can accidentally build internal data pipeline tools. Sometimes that works, but often it just becomes maintenance debt. For example, a business might end up with a tool that resembles Airflow. Essentially, they’ve created a product that already exists.
In this video, I explain why this happens and whether you should spend time building or buy instead.
How and Why Data Engineers Need To Care About Data Quality Now – and How to Implement It
You might think building a data pipeline as quickly as possible is a good thing. But what’s the point if it just produces low-quality data?
As companies integrate AI and machine learning in their data strategies, now’s a great time for them to focus on data quality. And that’s not just about accuracy. There are several factors to consider when developing a high-quality data system, which I explain in this short video.
Don’t Skip Modeling
Data Modeling Challenges – The Issues Data Engineers & Architects Face When Implementing Data Models
Many tools offer self-service, but if their data models are bad, you’ll likely have a bad experience.
Take a look at this video. I start by explaining data modeling and why it’s such an important skill for data engineers. Then I list the challenges of dealing with the logic and business processes associated wth it, so you know exactly what to expect.
Final Thoughts
I’ve learned a lot over the years while modernizing my data stack. My suggestion is this: before buying your next data tool, don’t ask whether it has a particular feature. Instead, think about the entire system you want to build. That means understanding where your data is going, why data pipelines exist, and how to generate high-quality data. Only then can you choose the right products.
Want to learn how to set up a data stack and make it work for your business? Subscribe to my YouTube channel. I’m Seattle Data Guy, and I post videos about data science, data engineering, and consulting.
If you’d like to read more about data engineering and data science, check out the articles below!
What Are Data Pipelines And Why Do They Exist
Cut Your Snowflake Bill by 70%
What Is A Data Platform And Why You Should Build One
Throughput vs Latency: Understanding the Key Difference in Data Engineering
How to build a data pipeline using Delta Lake
