Lesson Overview

In this lesson we will:

  • Consider the common architectural and deployment patterns of Modern Data Platforms.

Architectural Tiers

At a high level, businesses are likely to have the following requirements in relation to their data and analytics solution:

  • Extraction - Extracting data from data sources such as line of business applications, SaaS tools and operational databases;
  • Ingestion - Ingesting extracted data into a central location such as a Data Lake or Data Warehouse for subsequent use and analysis;
  • Transformation - Transforming the data such that it is cleaned, structured and processed ready for further analysis;
  • Storage - Storing the data in a persistent store such as a data warehouse or data lake;
  • Consumption - Capabilites such as searching, reporting, dashboarding for use by Data Analysts, Data Scientists and business users;

Architecturally, these could be thought of as tiers or layers of the stack which we can consider independently.

Extraction

A typical business will likely need to ingest data from various data sources into their Data Platform in order to give them a joined up single view of their business.

Common data sources include applications, SaaS tools, operational databases, and ad-hoc data sources such as spreadsheets or data sourced through APIs.

The first capability that we need therefore is to extract this data from our sources. This is typically achieved with a combination of

Ingestion

Once our data is ingested

This has to happen both for an initial load, and then as an ongoing basis to keep the centralised Data Platform up to date as new data is captured in the sources.

For further detail on the ingestion tier, please visit out less on ingestion.

Transformation

In this layer, we will take the source data and cleanse, modify it and prepare it to meet the requirements of the business and downstream consumers such as Data Analysts and Data Scientists.

Historically, these transformations too place before data was loaded into the centralised Data Warehouse (Extract, Transform, Load). However, in the Modern Data Stack, it more typically happens after the load has taken place (Extract, Load, Transform).

For further detail on the transformation tier, please visit out less on transformation.

Storage

The next tier is all about storing the data and making it avaialble for queries and consumption by your business.

Typically, this will include some Data Warehouse or Data Lake which will act as the long term persistent store of your data.

Consumption

This layer includes tools such as:

  • APIs that allow developers and data professionals to query and extract the data they need from the data platform;
  • Business Intelligence tools which allow Data Analysts to explore data and build reports and dashboards.

Underlying Infrastructure

As discussed, a key feature of Modern Data Platform tools includes the fact that they are often cloud based or delivered as a Software As A Service. This allows the tool to benefit from the underlying characteristics of the cloud such as it's scalability and elasticity.

We discuss how the cloud enables the Modern Data Platform in more detail in the next lesson

Summary

In this lesson, we considered the key components of the Modern Data Platform and how they are architecturally integrated together as part of the Modern Data Platform.

We proposed a four layer view, with an Ingestion Layer, a Transformation Layer, a Storage Layer and a Consumption Layer.

In the next lesson, we will consider how specifically how Cloud (as offered by the likes of AWS or Azure) supports this overall architecture.

Next Lesson

In the next lesson we will consider how cloud platforms such as AWS, Azure and Google Cloud Platform support and enable the Modern Data Stack.

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

Cloud And The Modern Data Stack

Prev Lesson:

Introduction To The Modern Data Stack

© 2022 Timeflow Academy.