3 Heads Of Data And Founders Perspective On Where Data Is Going

Photo by ThisisEngineering RAEng on Unsplash

2021 is almost halfway over and it seems like hundreds of millions of dollars has gone into investing in data, data start-ups and machine learning.

In particular funding has also shifted heavily from just focusing on the data science and machine learning space to the data engineering and data management space.

Of course, if you’re a AI based data management company, then I am sure you will be rolling in funding.

But, let’s look to see what other data experts have to say.

We asked people from various parts of the data world to provide their insights into what they see for the rest of 2021 and the quickly coming 2022.

Whether that be new start-ups, technologies or best practices.

Let’s see what they had to say.

Colleen Tartow, Ph.D. — Director of Engineering, Starburst Data

LinkedIn/Twitter

There’s been a shift in the data job market happening for a while now, away from data science and toward data engineering. And frankly it’s about time — we know that a good data foundation is paramount to building a corporate data strategy. Data engineers are essential to connecting data from a variety of sources with the reporting, BI, and data science systems that turn the data into rich business value. Data engineering was the fastest growing job in technology in 2019, open data engineering positions rose by 40% in 2020, and surely that trend will continue this year as well.

source

The driving force behind this shift is that businesses have realized that a large data science program without a solid data foundation is not reliable, and the key to embracing data-driven decision making is to start with data engineering. Related jobs like analytics engineers and data architects are also on the rise, as they focus on shortening the timeline to data value and building stable and reliable data architectures, respectively. With a solid data engineering base, businesses can realize their goals of valuable BI and data science programs built on a strong foundation.

What’s more, by harnessing the power of both legacy data in existing systems and newer data in cloud-based, modern infrastructure, businesses can drive insights that will be more agile and allow them to react to changes in the market in a fresh way.

Gartner has coined this as X-analytics, and technologies such as Starburst Data are essential to proactively building a data platform that can pivot quickly and react to global events (like, say, a pandemic) without requiring an entirely new process or infrastructure buildout. X-analytics is the next generation of data engineering, and will bring us into the future of data.

Joe Reis — CEO/Co-Fonder Of Ternary Data

Linkedin /Website

Data engineering is becoming more “enterprisey”. This may make you violently cringe. The term “enterprise” conjures up nightmares of faceless committees dressed in overly starched blue shirts and khakis, endless red tape, waterfall development, and the place where innovation goes to die. This image is certainly disturbing, and also not what I’m talking about. When I discuss “enterprisey”, I’m referring to some of the good things that larger companies might do with data — management, operations, governance, and other “boring” stuff. I think data engineering becoming “enterprisey” is a great thing, and welcome it with arms wide open!

“Enterprisey” Data Engineering Is Now Cool!

Once upon a time, data engineers largely focused on maintaining the lower level details of complex “big data” tools. These tools often had a lot of moving parts, and data engineers didn’t have time for much else except maintenance, fire fighting, and other heroics. As a result, a lot of “enterprisey” things fell by the wayside — data governance, data discovery, data quality, and a slew of other critical data management and operational practices.

Nowadays, data tools are abstracting much of the heavy lifting of “big data” tools. Things that were once complicated, like data pipelines and data lakes/warehouses, are commoditized to the point where they are largely “plug and play”, and “set it and forget it”. Think of companies like AWS, Google Cloud, Azure, Snowflake, Fivetran, and countless others who are simplifying the data stack from end to end. While a data engineer will still engineer systems, the engineering will be focused on creating high-value systems that lead to competitive advantage and differentiation.

Because of widespread data tool abstraction and simplification, a data engineer now has the bandwidth to start working higher up on the value chain — data management, DataOps, among others. While these were practices once reserved for large enterprises, they’re becoming mainstream for companies of all sizes and maturity. Just like there are countless companies simplifying the “big data” stacks of yore, a new crop of best practices, tools and companies are now tackling once “enterprisey” areas such as data governance, data discovery, data quality, and a slew of other critical data management and operational practices. Think Great Expectations (data quality), DataHub (data catalog), and many other projects currently working to solve once ignored problems in data engineering.

With more attention on “enterprisey” problems in data engineering, data engineers will move up the value chain and tackle different types of problems than those from several years ago. I’m excited to imagine the types of problems the next generation of data engineers will be solving in a few years. “Enterprisey” data engineering is now cool. Get used to it.

Sergey Karayev- Head of STEM AI — Turnitin

Linkedin/Twitter

Gradescope is a web application for college instructors to grade student exams, homework, and programming projects.

We have about a dozen different machine learning models in production, ranging from old-school image processing, to hand-coded feature-based MLPs, to state-of-the-art large Transformer models for handwriting recognition and text understanding. The primary challenge we face is in “closing the flywheel,” or in other words connecting the monitoring of our production models back to the data on which they were trained and evaluated. First, it is difficult to set up proper monitoring for the models.

Most model predictions are not surfaced to the user in a way that allows them to provide feedback, so we are often flying blind with respect to metrics. Second, for those predictions that do allow for user feedback, we have to hand-develop complicated SQL dashboards to compute the right metrics and point to the right inputs and predictions.

Third, these dashboards still fall short, as the ideal dashboard would be able to display rich data such as images, which existing solutions do not allow. Lastly, our tools for monitoring are disjoint from tools for managing and reviewing datasets, and it is not easy to add examples found in monitoring to the training or evaluation sets and thus improve the model on re-training. Writing custom dashboards and web interfaces just to monitor and manage data is not something data teams should have to do. I hope to see specialized tools emerge soon to deal with closing this functionality gap!

Tech For The Rest Of 2021 And Beyond

There is a lot to look forward for the rest of 2021.

I am looking forward to see if data engineering really will take center stage. There are plenty of statistics and surveys showing that data engineering is on the rise in terms of job growth.

Personally I still fail to see how data engineering will overtake the “cool” factor of data scientist. Only because data science tends to have more of that center stage presence and data engineering tends to be in the background.

I will gladly be wrong in terms of how cool data engineering is.

We hope that the rest of 2021 will not only lead to technology improvements but also lead to personal growth and expansion for all of our readers.

Thanks for reading, and good luck in the second half of 2021

Thanks for reading! If you want to read more about data consulting, big data, and data science, then click below.

The Last Advanced SQL Tutorial You Need To Watch

Greylock VC and 5 Data Analytics Companies It Invests In

5 SQL Concepts You Need To Know Before Your Next Data Science Or Data Engineering Interview

How To Improve Your Data-Driven Strategy

What In The World Is Dremio And Why Is It Valued At 1 Billion Dollars?

Mistakes That Are Ruining Your Data-Driven Strategy

5 Great Libraries To Manage Big Data With Python