Lesson Overview

In this lesson we will:

  • Learn about Data Warehouses and their role as part of the Modern Data Stack.

Data Warehousing

Data Warehouses are large, centralised databases which are typically used to combine data from multiple line of business applications and data sources into a single organised location.

For instance, a business might opt to combine all of the data from sales, marketing, finance and HR functions into a centralised Data Warehouse for a joined-up view of what is happening across the entire organisation.

After ingesting and organising this data, the Data Warehouse is then responsible for exposing it to business stakeholders by serving reports, dashboards, and interactive analytics. Most often, this is delivered through third party tools for business intelligence style reports and dashboards which are deployed to business users.

Data Warehouses are designed and optimised to ingest and store large volumes of data, and to be able to serve the resulting business intelligence workloads with high performance. This is in contrast with more transactional databases which are designed for higher perforamnce real-time transactional workloads rather than analytics over large data. The terms Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) are sometimes used to describe these two workloads respectively.

Modern Data Warehousing

Data Warehouses have been in use for decades, and this is a very mature field with established tools and practices. Almost every large organisation will have at least one Data Warehouse, and vendors such as Oracle, Microsoft and IBM have historically owned most of this market with on premises solutions.

In recent years however, Data Warehousing has experienced a rapid evolution and uptick in innovation as part of the Modern Data Stack.

This process started with Data Warehousing solutions provided by they hyperscale cloud vendors, with services including AWS RedShift and Google BigQuery. These products bought the benefits of cloud such as scale, elasticity and consumption based pricing into the Data Warehousing realm, making Data Warehousing suitable for more modern requirements such as managing machine generated data, machine learning use cases or real time requirements.

Outside of the major cloud providers, we also saw Snowflake emerge as a cloud agnostic Data Warehousing technology built specifically to take advantage of the Cloud. Snowflake is the most rapidly growing cloud native Data Warehouse and has seen rapid adoption in industry due to it's power and ease of use and deployment.

Cost Effectiveness

Modern Data Warehouses have made Data Warehousing much more economically viable for businesses.

Historically, a Data Warehouse would have required a signifcant up-front investment in terms of hardware, software licenses. This would be a capital purchase and not would need to be sized based on predicted future workloads. More expensive still would be the teams of engineers with the skills to operate the Data Warehouse and implementing all of the ingestion and transformations required by the business.

With Modern Data Warehousing, there is typically no up-front cost. Data Warehouses can be created in minutes through a Web GUI, perhaps after entering a credit card. The cost will then only rise based on consumption, for instance the amount of data stored, the number of queries served, or the amount of compute power required.

The operational overhead of managing the Data Warehouse is also much reduced, with the cloud provider taking care of initial deployment, upgrades, backups, optimisations and other such requirements as part of the monthly fee.

Though there still of course costs with the platform, they are significantly lower from a total cost of ownership perspective in comparison with traditional approaches.

Summary

In this lesson, we considered Data Warehousing and it's role as part of the Modern Data Stack.

We discussed how Data Warehousing has had a recent rapid uptick in innovation as Data Warehousing has evolved to take advantage of the inherent properties of Cloud.

In the next lesson, we will consider the same for Data Lakes, which are sometimes complementary and sometimes competing solutions for Data Warehouses.

Next Lesson

In the next lesson we will learn about Data Warehouses and their role in the Modern Data Stack.

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

Data Lakes In The Modern Data Stack

Prev Lesson:

Analytics and Business Intelligence

© 2022 Timeflow Academy.