How To Modernize Your Data Architecture Part 2 – Data Analytics Strategy Consulting

How To Modernize Your Data Architecture Part 2 – Data Analytics Strategy Consulting

April 1, 2021 data engineering consulting Uncategorized 0
data engineering consulting

Photo by CHUTTERSNAP on Unsplash

In our last article, we outlined what to avoid when you modernize your data architecture.

That was more of an introduction piece for people to know what to do when they start developing their data strategies.

Now we are going to work to start developing a strategy to modernize your architecture. 

Over the past three years, our company has had the opportunity to work with companies in all different industries like Insurance, Finance, Transportation, Healthcare, and Saas (to name a few).

All of these projects were unique and all started with our clients at different levels of data maturity.

Some companies had already developed a base layer of data infrastructure, others had only occasionally extracted data manually from their various Salesforce and Asana instances.

Overall, this provided our team a great understanding of how to approach modernizing and optimizing data infrastructure.

Whether you are looking to improve your data engineering processes, implement a data warehouse, or creating some form of data science model our team has probably done one of the three.

In this article, we will outline some of the first steps in this process.

Where Do I Start With My Data Project

Before getting too far ahead of your data strategy you need to assess where you are and where you want to go.

It’s easy to get excited about what is possible.

There are thousands of articles talking about successful data use cases.

These use cases involve companies that are saving thousands if not millions of dollars because of data or increasing profits.

It can be tempting to run before crawling.

But the first step to any project really, not just data projects is figuring out where you want to go.

Your Business Goals

Before starting your data infrastructure or data science projects. You should first start by knowing what your business goals are.

What I mean is you first need to ask questions like:

What are you trying to improve or change in your business?

Do you need data to drive that change?

The truth is data and dashboards can actually all be very distracting.

Your business has limited resources.

As a data consultant I have often come into companies and needed to spend a decent amount of my time just getting everyone on the same page for what needs to be done. 

There is so much data to pull in and process that it can be tempting to try to take on 4, 5 or 10 different initiatives all at onces.

But this doesn’t work if there is no one to act on your data.

That’s why it is important to first understand what your goals are.

Once you know where you want to go you can start to assess what data your company has.

The Data Source Problem

It’s strange. Data tooling like ETL and data pipeline software has gotten better over the years. We have so many options that range from robust and flexible coding frameworks like Airflow to low code options like Fivetran, Stitch  and Talend.

Yet data engineers are still often overwhelmed with getting all the various data sources into company data warehouses.

Why?

Because there are just so many possible data sources.

So this is where we always start when it comes to developing and modernizing a data strategy.

Deciding what kinds of infrastructure components is based on what data exists.

These could be tools like Zendesk, Workday, Salesforce, and Commerce7.

Just to name a few.

But you do need to spend some time figuring out what data exists and what data you plan to pull form.

Setting up a quick google sheet with data sources, POC and maybe a few quick notes can be very helpful both internally as well as when your company decides to work with external data engineering consultants.

Data Use Cases

Once you know what data then you can start looking at your company’s data use cases.

What I mean by this is, what do you want to do with the data.

In the previous step, we organized all the various data sources. In this step, we are trying to organize and eventually prioritize which data use cases to do first.

This starts by looking at all your various workflows and where teams are currently using data.

Companies are likely already creating reporting by pulling manual excel reports on a daily or monthly basis, so the goal is to list all of these tasks, whether it be dashboards, reports, or possible future tasks like creating models and then figuring out the impact and decision they drive.

Overall, it is important to understand broadly what your team wants to do with the data. This will ensure you pick the right data storage, pipelines, and data analytical layers. 

Also, your team likely has more data sources and use cases that can be done in a short period.

Generally, most companies have more goals and use cases than can be automated in a few sprints. 

Also, some use cases might not be worth automating or developing infrastructure for. Not because the actual task is invaluable but because there are so many other valuable tasks. 

This step will allow you to start to outline your data roadmap which will contain all of your company’s data goals.

The key here is not to rush. But to realize that each step takes time and that the end data goals need to align with your overall company strategy.

Otherwise, even if you do complete the technical aspects of your data goals, the implementation on the business side may fail.

The Data Storage Answer

Once you have a good understanding of what data you have and what you want to do with that data, then you can start to answer your data storage question.

That is, how will you store your data.

There are a lot of options when it comes to storing data for analytical purposes. Data storage tools like Snowflake, Bigquery, and Azures Synapse Analytics are all very popular but there are a lot of start-ups that are trying to focus on creating more of a virtual layer when it comes to managing all your data sources.

For example, tools like Starburst, Denodo, and Rockset all have some form of abstraction in terms of reducing the amount of processing and storing you need to do when it comes to analyzing multiple data sources together.

The Modern Query Engine

Much of this has to do with most modern query engines and tools providing some sort of query federation which allows you to query across application data storage systems like MySQL and Kafka. 

But the truth is you won’t know what to pick in terms of data storage unless you know the size of your data, how often you need to process it, and how it will be used (and of course your budget).

All of these aspects play a role in what data storage system might be best for your company.

For example, if your team operates with a lot of unstructured data sets and needs to run ML models over said data sets, then it might make sense to implement some form of data lake or data lakehouse.  

On the other side, your company might be dealing with standard reporting that comes from well-structured data. This would likely benefit from a standard data warehouse.

Of course, this also could come down to your budget, your team, and what technical infrastructure your team is already using.

I am not going to go into too much detail about which tool is best just because that would truthfully require an entirely new article. There are also plenty of articles that compare tools like Redshift, Snowflake, Bigquery, etc.

Also, which data storage tool is best for you could change over time.

So I want to avoid anything too definitive that will likely be susceptible to change.

Do You Need Help Modernizing Your Data Analytics Architecture?

Wherever your team is at in your data infrastructure journey, our team can help.

We have worked with multiple clients and helped them go from raw data to data-driven strategies.

This is more than a cliche statement for us.

This is our job.

So if you need help, then reach out today.

If you just want to learn more about our views on data strategy development, then subscribe to this blog and we will continue to provide new insights.

I will spend more time diving into some of the other questions in the future as well as in my data analytics strategy guide I will be putting out.

Thanks for reading! If you want to read more about data consulting, big data, and data science, then click below.

Developing A Data Analytics Strategy For Small Businesses And Start-ups

5 SQL Concepts You Need To Know Before Your Next Data Science Or Data Engineering Interview

3 Ways To Improve Your Data Science Teams Efficiency

4 SQL Tips For Data Scientists

How To Improve Your Data-Driven Strategy

What Is A Data Warehouse And Why Use It

Mistakes That Are Ruining Your Data-Driven Strategy

5 Great Libraries To Manage Big Data With Python

What Is A Data Engineer

Why You Need To Migrate To The Modern Data Stack