In this lesson we will:
- Learn about scheduled tasks including what they are and how to use them.
Tasks allow us to schedule activities to run within the Snowflake database, for instance hourly, daily or according to some arbitrary schedule.
A task could be as simple as a SQL statement, or could call into a stored procedure if we need more complex logic.
There are various use cases for tasks:
- ETL - e.g. Periodically transferring and transforming ingested data;
- Retention - e.g. Periodically deleting data which is no longer necessary;
Snowflake tasks can be chained together and made dependent on one another.
For instance, A to B to C.
Snowflake has a feature called Streams which can be used to track when data is inserted, updated or deleted within tables.
Combining streams with tasks is a powerful combination of features, because we can periodically take actions only on data which has changed.
Streams are discussed in more detail in the next lesson.
The advantage of Snowflake tasks is that they can be configured directly within Snowflake using SQL, and require no external tooling to learn or implement.
There are however more powerful and fully featured tools for job scheduling and dependency management which are more commonly used in practice.
DBT for instance is widely used to define transformation runs from a series of dependent tasks. [Airflow] is also a very commonly deployed tool as part of Modern Data Stacks which is more powerful but involves describing jobs in Python.
In this lesson we looked at Snowflakes scheduled task feature.
We described use cases including ETL and managing data retention processes.
We also looked into job depdendencies and the DAG concept, and explained how task management often happens outside of Snowflake using tools such as DBT or Airflow when building Modern Data Stack platforms.
Description of next lesson here