In this lesson we will:
- Learn about the difference between Operation and Asset based pipelines.
Operations and Assets
We can think about our DAGs in one of two ways - as a series of operations or assets.
Airflow, the first generation orchestrator was modelled around operations, a series of imperiative steps to get the end goal that we need.
Though Dagster can also support an Operation centric view of the world, Dagster encourages us to think in Assets, whereby our pipelines are creating and manipulating data assets.
This gives rise
### Operation Based
We will start by writing two functions. One will download and unzip a file, and a second will break it into
def download
def split
We will then make two changes to make this code compatible with Dagster. Firstly, we will annotate the functions with @op to show that they are nodes in the graph.
We will then make the split function.
require dagster
@op
def download
@op
def split( download )
We can then run Dagit by point at the file:
dagit dev -f pipeline.py
After a moment, a new browser should open
Asset Based Pipelines
Assets tend to be a cleaner mental model.
require dagster
@asset
def downloaded_file
@asset
def split_file( download )
Close Dagit and reopen it with the following command:
dagit dev -f pipeline.py
After a moment, a new browser should open showing the new materialsied pipeline.