In this lesson we will:
- Introduce Snowflake;
- Describe some of it's key differentiators;
- Discuss Snowflakes broader Data Cloud capabilities.
Snowflake is a modern Data Warehouse designed to take advantage of the cloud.
Data Warehouses are large databases designed to combine data from many sources across a business into one central location.
Once data is combined, joined up, cleansed and organised in a central location, it can then be used for business intelligence purposes such as reports, dashboards or data science activities.
Data Warehouses are optimised for analytical use cases, so can scale to support large historical datasets and a high number of concurrent users with business intelligence type workloads. This is in contrast to databases such as MySQL or PostgreSQL which are designed to support real-time transactional workloads which usually require very fast access to relatively small datasets.
Though Data Warehousing is a very mature field, Snowflake brings a modernised approach and a cloud native architecture which make it uniquely powerful versus legacy competition. This has led to rapid adoption in industry.
Snowflake is delivered through an entirely Software-As-A-Service (SaaS) model, meaning there are no software or servers to run or configure. Servers are of course still there, they are simply managed transparently for you.
Snowflakes success with the SaaS deployment model is a notable innovation in the data space, as until now, enterprise customers have been reluctant to hand over their strategic data to a third party to such an extent. Snowflake was the first product compelling enough to overcome this objection.
In addition to the fully SaaS deployment model which minimises the amount of operational overhead, Snowflake is relatively simple to use and operate. For instance, there is less to do in terms of tuning parameters and management overhead in comparison with traditional databases such as Oracle or SQL Server that historically needed expert Database Administrators to run.
Snowflake offers a genuine usage based billing model, whereby you pay by the second for the compute resources that you use, and by the byte for the storage that you consume. This means that businesses can get started with Snowflake cheaply, and there is no need for overprovisioning to support future workloads.
This pricing model is compelling compared to the traditional vendors who have high per CPU core billing models or require 24x7 server capacity to remain available even when not in use.
Snowflake makes a number of innovations around performance which for some benchmarks make it the most highest performing data warehouse on the market.
Snowflake is based on a very innovative Cloud Native architecture, meaning that the attractive properties of Cloud infrastructure are reflected for users of Snowflake. This includes it's ability to rapidly scale up and down, it's high performance, and fully consumption based pricing. This architecture is covered in more detail in the next lesson.
Though we refer to Snowflake as a Data warehouse, it has actually grown into more of a complete platform which is referred to as a Data Cloud. Essentially, this is Snowflakes attempt to meet more end to end and higher level requirements of data teams such as the following:
As well as storing relational data within the traditional data warehouse, Snowflake can also enable a Data Lake pattern. This means that data is stored outside of Snowflake (for instance in an AWS S3 account), but queried and access via Snowflake. This combines the best of the Data Warehouse and Data Lake in a similar way to the Data Lakehouse pattern.
Snowflake enables you to share data across accounts with third parties such as business partners or clients in a controlled and goverened manner. Where appropriate, this data can be sold and licensed through the Snowflake Data Marketplace.
Snowflake alllows you to carry out Data Science activities such as model building or analytics directly within the platform. This is enabled with the Snowpark feature which allows you to run Python directly within Snowflake. Once developed, Snowflake can also host your machine learning models to simplify production deployment.
Snowflake strays into areas relating to Data Engineering which have historically been managed by third party tools. These include incrementally loading data and data transformations. This approach simplifies your data pipelines by avoiding the introduction of additional tooling.
Using the Snowflake Data Cloud means that all data professionals such as Data Engineers, Data Analysts and Data Scientists can all work together on a single platform. This hugely simplifies the data ecosystem for businesses, and means that we have to spend much less effort copying data between locations due to everyone using a single repository.
In the next lesson we will dig deeper into Snowflakes cloud native architecture which enables the features discussed above.