In this lesson we will consider the common architectural and deployment patterns of Modern Data Platforms.
The Modern Data Stack has 4 core requirements:
- Ingestion - Extracting data from data sources such as applications, SaaS tools and operational databases, and bringing it into a central location for subsequent use and analysis;
- Transformation - Transforming the data on arrival such that it is cleaned, structured and made ready for consumption or analysis;
- Storage - Storing the data in a persistent store such as a data warehouse or data lake;
- Consumption - Capabilites such as searching, reporting, dashboarding for use by Data Analysts, Data Scientists and business users;
Architecturally, these could be thought of as tiers or layers of the stack which we can consider independently.
It is likely that you will need to ingest data from various data sources into your Data Platform. These data sources include applications, SaaS tools, opeartional databases, and ad-hoc data sources such as spreadsheets or data sourced through APIs.
The first task is to take this source data and ingest it into the Data Platform. This has to happen both for an initial load, and then as an ongoing basis to keep the centralised Data Platform up to date as new data is captured in the sources.
For further detail on the ingestion tier, please visit out less on ingestion.
In this layer, we will take the source data and cleanse, modify it and prepare it to meet the requirements of the business and downstream consumers such as Data Analysts and Data Scientists.
Historically, these transformations too place before data was loaded into the centralised Data Warehouse (Extract, Transform, Load). However, in the Modern Data Stack, it more typically happens after the load has taken place (Extract, Load, Transform).
For further detail on the transformation tier, please visit out less on transformation.
The next tier is all about storing the data and making it avaialble for queries and consumption by your business.
This layer includes tools such as:
- APIs that allow developers and data professionals to query and extract the data they need from the data platform;
- Business Intelligence tools which allow Data Analysts to explore data and build reports and dashboards.
As discussed, a key feature of Modern Data Platform tools includes the fact that they are often cloud based or delivered as a Software As A Service. This allows the tool to benefit from the underlying characteristics of the cloud such as it's scalability and elasticity.
We discuss how the cloud enables the Modern Data Platform in more detail in the next lesson
In this lesson, we considered the key components of the Modern Data Platform and how they are architecturally integrated together as part of the Modern Data Platform.
We proposed a four layer view, with an Ingestion Layer, a Transformation Layer, a Storage Layer and a Consumption Layer.
In the next lesson, we will consider how specifically how Cloud (as offered by the likes of AWS or Azure) supports this overall architecture.