In this lesson we will:

  • Learn about the difference between Operation and Asset based pipelines.

We can think about our DAGs in one of two ways - as a series of operations or assets.

Airflow, the first generation orchestrator was modelled around operations, a series of imperiative steps to get the end goal that we need.

Though Dagster can also support an Operation centric view of the world, Dagster encourages us to think in Assets, whereby our pipelines are creating and manipulating data assets.

This gives rise

### Operation Based

We will start by writing two functions. One will download and unzip a file, and a second will break it into

def download

def split

We will then make two changes to make this code compatible with Dagster. Firstly, we will annotate the functions with @op to show that they are nodes in the graph.

We will then make the split function.

require dagster

def download 

def split( download )

We can then run Dagit by point at the file:

dagit dev -f

After a moment, a new browser should open

Asset Based Pipelines

Assets tend to be a cleaner mental model.

require dagster

def downloaded_file 

def split_file( download )

Close Dagit and reopen it with the following command:

dagit dev -f

After a moment, a new browser should open showing the new materialsied pipeline.

