Course Overview
Orchestrating Data Platforms With Dagster

Operations and Assets

Lesson #5

In this lesson we will:

  • Learn about the difference between Operation and Asset based pipelines.

Operations and Assets

We can think about our DAGs in one of two ways - as a series of operations or assets.

Airflow, the first generation orchestrator was modelled around operations, a series of imperiative steps to get the end goal that we need.

Though Dagster can also support an Operation centric view of the world, Dagster encourages us to think in Assets, whereby our pipelines are creating and manipulating data assets.

This gives rise

### Operation Based

We will start by writing two functions. One will download and unzip a file, and a second will break it into

def download

def split

We will then make two changes to make this code compatible with Dagster. Firstly, we will annotate the functions with @op to show that they are nodes in the graph.

We will then make the split function.

require dagster

@op
def download 

@op
def split( download )

We can then run Dagit by point at the file:

dagit dev -f pipeline.py

After a moment, a new browser should open

Asset Based Pipelines

Assets tend to be a cleaner mental model.

require dagster

@asset
def downloaded_file 

@asset
def split_file( download )

Close Dagit and reopen it with the following command:

dagit dev -f pipeline.py

After a moment, a new browser should open showing the new materialsied pipeline.

Next Lesson:
05

Schedules

In this lesson we will schedule Dagster jobs for automated pipeline execution.

0h 15m



Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved