How To Set Up Your Data Strategy In 2022

How To Set Up Your Data Strategy In 2022

April 26, 2022 Analytics Engineering data analytics strategy 0
data engineering scaling hiring

The modern data stack has been discussed heavily in 2020 and 2021. But what wasn’t discussed enough was how to set up a successful data strategy.

Truthfully, to drive value from your data you’re going to need to focus less on modernizing and more on your actual business problems.

Driving business value with your data strategy will take much more than just paying for tools. Instead, your business team will need to assess what goals are, what data sources do you manage and where are you in your overall data maturity journey. 

First – Start By Assessing Where Your Current Data Strategy Is

Before getting into what you can improve in your data stack, it’s good to understand where your company and teams fall in terms of your data maturity.

What is your company doing currently to analyze their data?

There are a lot of ways data analytics infrastructure can be developed in 2022. Your company might be driven by your engineering team and thus your software engineers might be building your data pipelines.

On the other side, if your company doesn’t have a large engineering team, then your analysts might be putting together your analytics.

But these are merely the processes to get to analytics. The following question is, what is your company doing with said analytics.

Second – Define What You Actually Want To Do WIth Data

Building a modern data stack for the sack of building one is a massive waste of time and money if you don’t have a plan.

When your team looks into implementing their data strategy, they actually need to have a…data strategy.

What Current Business Problems Are You Facing – Data strategies need to be aligned with business strategies. Otherwise what ends up happening is that perhaps you discover some interesting findings. 

You present those findings to leadership and although intrigued they have no available resources to follow-up. Thus, your findings, your insights, your hard work dies right there.

It will be forgotten and wasted. 

So, make sure you are aligned with the business and you have allocated business parters who will act on insights. 

What Are Your Businesses Goals– Once you are aligned with the business, you should write down clear goals for your data strategy. This could be:

  • Reduce costs in key areas such as reducing cloud costs
  • Increase customer conversion in specific key segments
  • Improve LTV of customers
  • Research which features to develop or what markets to go into
  • Etc

Having a few target goals such as the ones listed above can help guide your data analytics team to making clear decisions on which areas they should be researching or creating reports around.

Third – Setting Up The Right Architecture

The right data architecture isn’t a one size fit all solution. Every company has its own unique problems. Some companies need to create data infrastructure that has on-prem databases that are running IBM I-Series systems while others are using all SaaS.

This is often where it is great to call a solutions architect or consultant. Once all your ducks are in a row and you already have a general strategy, then you can bring said information to them. They will then be able to help break down the problem into the tools you will need.

Data Ingestion

Data ingestion refers to the action of taking data from their sources and loading them into your data warehouse. Sometimes you might also hear the term data pipeline, ETL, ELT, or real-time data pipelines. All of these do play a role in data ingestion. There are multiple options when it comes to data ingestion.

These range from 100% custom code, code frameworks, and low-code solutions.

We believe that 100% custom code should always be a last resort. Unless your team has the goal of completely recreating Airflow or Dagster, there is no reason to start from scratch. Frameworks like Airflow manage some of the many components you will need to develop for your data ingestion.

Airflow acts as an orchestrator that you can then utilize its many Operators, such as a BigQuery operates to extract data from sources and then insert it into different data storage systems.

But these frameworks do require some understanding of programming and DevOps. So what if you have a very small team that doesn’t have time to program you might need to pick another solution. That is where tools like Fivetran, Matillion, and Stitch come into play.

These tools are low-code options that your team can use to connect to data sources and pipe the data into your data warehouses or data lake. All without any code.

This is my experience does not mean you shouldn’t have an engineer build these data pipelines. These pipelines are still best built by data engineers who can implement the best practices they have developed while learning code.

The trade-off here is the speed at which the pipelines can be developed. Which tool is right for your team is dependent on your business goals, expectations, and team skill sets.

Data Storage

Data storage references concepts like data warehouses, data lakes, and data lake houses.

All of these act as an analytical data storage layer. Companies use a wide variety of data storage systems to meet the need of storing data for analysts, data scientists and general users. Companies might pick Snowflake, BigQuery, Postgres, Redshift, SQL Server and a whole host of other databases(and thats just for the data warehouse).

The purpose of this layer is to create a source of truth where analysts, data scientists and end-users can access data from multiple data sources. For example, your data warehouse might have data from your custom application, Workday, Facebook ads, Asana, Braze and so on. Having a centralized reporting layer allows users to avoid having to go to 4-5 different sources to create a basic report.

Now, they can write one query and merge all of these data sources into one data set.

In addition, another benefit of having a central location is that often you create data standards, governance, quality checks and likely a light data cleansing that makes the data easier for analysts to work with.

This is the overall goal. Creating a data layer that anyone can access(based on their security roles) and can rely on. Without the need for lots of heavy manual processes such as pulling data from data sources into csvs and then combining it altogether.

In the end, once you have the data, you need to put it to work through reporting and data visualizations.

Data Visualization

Finally, data visualization and reporting are the last of the three key pillars in your baseline data strategy. This is because just ingesting and storing your data is not sufficient. Businesses need a purpose for this data. This is often done through data reporting and visualizations.

Whether these are dashboards, KPIs, Excel reports, or just some form of the final number.

These, when applied correctly can drive business initiatives, help businesses make better decisions, and provide confidence to executives who are looking to understand exactly what is going on in their business.

Of course, like all of these other sections, there are many choices for data visualization tools. You could pick the classic Tableau or decide on the Cloud-Based solution Looker. All of these tools have different benefits.

I will mention that Looker in particular does require a longer learning curve but when utilized well can help your analytical teams work on a more consistent layer of data. Overall, finding the right data visualization tool starts with your team figuring out your data goals.

Taking Action On Your Data Insights

Now that you have built your baseline data stack. Have clear goals and metrics that can surround them. It’s time to take action.

It’s important that as your team takes action, whether that be to add in a feature on a website or send a coupon to a targeted segment, that you record the outcome. Data analytics isn’t just about the initial insights. Data teams utilize those insights to take action and then have a feedback loop based on the outcome.

Sometimes even when a logical decision is made based on insights, unexpected results occur. However, those unexpected results can simply provide further understanding for a company.

Thus, each small step made with your data strategy can provide further information on what actions are best to take when.

Setting Up Your Data Strategy For 2022

Picking the right data infrastructure components is crucial to ensuring your team can scale quickly and deliver. However, making sure you have a solid data strategy is far more important.

Over-promising and committing to every new data fad is a quick way for your team to never truly provide value.

Our team sees this problem occur a lot. Companies constantly are jumping on the next trend, vs delivering on the previous project.

This all starts with first landing data pipelines, creating a core data layer, and then finally, creating tangible insights, KPIs, and metrics that the business can use. Everything else is a distraction.

In 2022, our hope is that your team focuses on simplifying your data structure.

If you enjoyed this article, then consider checking out these videos and articles:

Why Migrate To The Modern Data Stack And Where To Start

Why Everyone Cares About Snowflake

Reviewing Varada And How It Can Improve Trino’s Performance

Why Data Strategies And Initiatives Fail