In this lesson we will:
- Discuss The Business Context For More Effective Use Of Data;
- Learn What Data Engineering Involves;
- Consider The Role Of The Data Engineer.
Businesses today collect lots of data right across their business. This data comes from many sources, including websites and mobile applications, internal line of business applications, and data from external suppliers and partners. This data is being created faster and in greater quantities than ever before.
Once collected, businesses also have increasingly high demands to make use of this this data. For instance, using it for intelligent analytics, reports, real time dashboards, data science model development and other business and product initiatives.
This task of dealing with more and more complex source data together with increasingly demanding business requirements is leading to new approaches to how businesses are deploying data and analytics capabilities within their business.
Data Engineers are the people within an engineering function who have the task of bringing these two sides together, putting the processes and automation in place to turn business data into actionable insights for their business.
Though Data Engineering is increasingly a critical function for businesses, it is relatively new and poorly understood. In the past, these activities have been handled by a mixture of job roles, none of which specialised in Data Engineering. Today however, we are seeing Data Engineering emerge as a distinct job title and speciality.
If we break down the problem, Data Engineers need to do three things:
- Extracting data from the sources, either through APIs, by querying a database directly, or setting up a solution for pushing or pulling data from the source;
- Transforming data into joined up, cleaned, usable formats and structures, and perhaps adding in analytics;
- Loading data into locations where it can subsequently be used. Often this will be a data warehouse or data lake which will service reports, dashboards or ad-hoc analysis by Data Analysts, Data Scientists and business users.
In some instances, the Data Engineers responsibility will also move into areas such as serving up the correct reports and dashboards, though often Data Analysts and Data Scientists will work on the "last mile" with the Data Engineers moving in a supporting role.
Of course, this data work isn't just a one time activity. Data Engineers need to put into place pipelines which continually process data as it is created in the sources, and to have these pipelines running reliably, accurately and with fast delivery of data in production.
Many businesses looking to improve their data capabilities will begin by hiring for skills including Data Scientists and Data Analysts. However, they will then find that these people are not as productive as they could be due to needing to spend significant time on data extraction and preperation.
These newly hired Data Professionals often find, for instance, that they need to manually request raw data extracts from source systems, that the data they get is messy, out of date or has gaps in it, or may be delivered in sub-optimal formats such as Excel spreadsheets. This is not the best use of their time and skills!
With a Data Engineering capability, these plumbing activities are handled on behalf of Data Analysts and Data Scientists, who have more time to apply their niche skills on actual analysis and modelling. Not only this, they will also continually receive up-to-date data through robust data delivery pipelines.
With this in mind, businesses should look at Data Engineering as a foundational capability which has to be put into place before Data Analysts and Data Scientists can be effective, let alone wider enablement of the business.
Description of next lesson here