Lesson Overview

In this lesson we will:

  • Discuss The Business Context For More Effective Use Of Data;
  • Learn What Data Engineering Involves;
  • Consider The Role Of The Data Engineer.

Business Context

Businesses today collect lots of data right across their business. This data comes from many sources, including websites and mobile applications, internal line of business applications, and data from external suppliers and partners. This data is being created faster and in greater quantities than ever before.

Once collected, businesses also have increasingly high demands to make use of this this data. For instance, using it for intelligent analytics, reports, real time dashboards, data science model development and other business and product initiatives.

This task of dealing with more and more complex source data together with increasingly demanding business requirements is leading to new approaches to how businesses are deploying data and analytics capabilities within their business.

Enter, Data Engineers

Data Engineers are the people within an engineering function who have the task of bringing these two sides together, putting the processes and automation in place to turn business data into actionable insights for their business.

Though Data Engineering is increasingly a critical function for businesses, it is relatively new and poorly understood. In the past, these activities have been handled by a mixture of job roles, none of which specialised in Data Engineering. Today however, we are seeing Data Engineering emerge as a distinct job title and speciality.

The Data Engineering Process

If we break down the problem, Data Engineers need to do three things:

  • Extracting data from the sources, either through APIs, by querying a database directly, or setting up a solution for pushing or pulling data from the source;
  • Transforming data into joined up, cleaned, usable formats and structures, and perhaps adding in analytics;
  • Loading data into locations where it can subsequently be used. Often this will be a data warehouse or data lake which will service reports, dashboards or ad-hoc analysis by Data Analysts, Data Scientists and business users.

In some instances, the Data Engineers responsibility will also move into areas such as serving up the correct reports and dashboards, though often Data Analysts and Data Scientists will work on the "last mile" with the Data Engineers moving in a supporting role.

Of course, this data work isn't just a one time activity. Data Engineers need to put into place pipelines which continually process data as it is created in the sources, and to have these pipelines running reliably, accurately and with fast delivery of data in production.

How Data Engineers Enable Data Analysts

Many businesses looking to improve their data capabilities will begin by hiring for skills including Data Scientists and Data Analysts. However, they will then find that these people are not as productive as they could be due to needing to spend significant time on data extraction and preperation.

These newly hired Data Professionals often find, for instance, that they need to manually request raw data extracts from source systems, that the data they get is messy, out of date or has gaps in it, or may be delivered in sub-optimal formats such as Excel spreadsheets. This is not the best use of their time and skills!

With a Data Engineering capability, these plumbing activities are handled on behalf of Data Analysts and Data Scientists, who have more time to apply their niche skills on actual analysis and modelling. Not only this, they will also continually receive up-to-date data through robust data delivery pipelines.

With this in mind, businesses should look at Data Engineering as a foundational capability which has to be put into place before Data Analysts and Data Scientists can be effective, let alone wider enablement of the business.

Next Lesson

Description of next lesson here

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

OLAP vs OLTP Databases

Prev Lesson:

Introduction To Stream Processing

© 2022 Timeflow Academy.