In this lesson we will:
- Introduce the concept of The Modern Data Stack;
- List some example tools which form part of the Modern Data Stack;
- Discuss the benefits of the modern data stack.
In order to completely meet the Data and Analytics requirements of a modern business, it is likely that you will need to combine multiple tools into an end-to-end solution. This includes tooling for data storage, querying, dashboarding, ETL, model development and other requirements.
As there is no one tool that does everything, it is likely that you will need to combine and integrate different technologies from different vendors. The collection of tools that you choose is sometimes informally referred to as a "Stack".
Modern Data Stack is a phrase that started to be used widely around 2020 onwards to describe an emerging set of platforms and tools which were growing in popularity and often integrated together to form an end to end to solution. Though there is no strict definition of which tools form part of this modern data stack and which ones fail to meet the bar, they typically share many common characteristics:
SaaS/IaaS - Many Modern Data Stack tools are provided as a fully cloud hosted managed service meaning there is no infrastructure to manage or software to install. Instead, you typically signup, login and use the tool as a fully managed higher level service;
Rapid Innovation - In addition to being consumed as a service, these tools are usually fast to get started with and require minimal configuration. This allows data teams to deliver value quickly, and place their efforts into high value initiatives specific to their business rather than configuring and operating the platform;
Cloud Native - Modern Data Stack tooling typically runs in the Cloud and is optimised to take advantage of the clouds scalability, elasticity, consumption based billing model. This is opposed to more traditional tooling which was developed in the on-premise era;
Scalable - Modern Data Stack tooling scales to support large volumes of data and large numbers of users. This is important as businesses continue to capture more and more data and have more complex use cases for it;
Open - Modern Data Stack tools are often Open Source or have an Open Source core with commercial addons. This is important to businesses that are looking to avoid lock-in to vendor technology as they modernise their tooling;
Composable - Where before there was a theme of data vendors trying to deliver the entire stack, Modern Data Stack projects and vendors accept that customers will want to combine best of breed tools into their overall solution. We often therefore see friendly collaboration between vendors in Modern Data Stack tooling;
Consumption Based Pricing - Most of the tools in the Modern Data Stack are charged for based on consumption, for instance based on data volumes processed or numbers of queries served. This means there are no large up-front costs, and cost should scale with usage and value delivered;
Accessible - Tools in the Modern Data Stack aim to be simple and easy to use, with a large focus on No-Code and Low-Code solutions. This means that all of the data professionals (Engineers, Scientists, Analysts etc) can contribute, rather than being restricted to their traditional siloed roles.
Of course, some tools have more of these characteristics than others, and some of them are quite subjective, meaning again that there is some debate as to whether a given tool qualifies as part of the Modern Data Stack. This isn't however worth worrying about too much - it's the high level principles that matter rather than specific definiitions.
As discussed, there are multiple components required to deliver a Modern Data Platform, which would typically be combined together to deliver the end to end capabilities that the business needs. Within each of these categories, various tools are commonly acknowledged to be part of the Modern Data Stack. For example:
|Ingestion||Fivetran, Stich, Airbyte|
|Data Warehouse||Snowflake, Azure Data Synapse, AWS Redshift, GCP BigQuery|
|Data Lake||Databricks, AWS, Azure, GCP|
|Parallel Data Processing||Spark, Databricks|
|Data Streaming||Kafka, Flink|
|Business Intelligence||Tableau, PowerBI, Looker|
|Data Quality||Datafold, Monte Carlo Data|
This is not an exhaustive list, but tools such as the above are definetly front and centre in the conversation about the Modern Data Stack, and display many of the characteristics in our list above.
The Modern Data Stack is considered to be a better platform and approach for businesses building Data and Analytics solutions today.
The tools considered to be a part of it are industry leading, and many of the architectural patterns and features such as low maintenence, low code and rapid time to value are desirable.
Furthermore, Modern Data Stack solutions should be delivered with a lower total cost of ownership. The consumption based pricing model means that you can avoid large up front capital costs, and can try new ideas and initiatives without a high financial cost of failure. Typically, solutions built in this way will have a longer term lower TCO when all infrastructure and staffing costs are taken into account.
In the next lesson we will explore common architectural patterns found in Modern Data Stack deployments.