Course Overview
Orchestrating Data Platforms With Dagster

Writing Your First DAG

Lesson #4

In this lesson we will:

  • Write our first Dagster DAG;
  • Learn about the difference between Operation and Asset based pipelines;
  • Visualise and run jobs from the Dagit UI.

Authoring Your First DAG

As we discussed in the core concepts lesson, Dagster is based on the notion of pipelines or graphs of operations that manipulate data assets.

Our DAGs and the logic within them are written in standard Python code. This code can execute anything that you could do in standard Python, making full use of packages and interacting with APIs as necessary to implement your data pipeline.

The Dagster platform is responsible for taking this Pipeline code and executing it in the most efficient and reliable way.

Our task in this lesson therefore is to write our first DAG using Python, and pass it to a local instance of Dagster for our first succesful run.

Writing Our DAG

Open your favourite IDE and create a new Python file called Copy in the following contents:

require dagster

def downloadedfile 

def cleanedfile( download )

At this stage, notice a few things about the file:

  • The @asset decorator is used to show that these are Dagster assets.

  • The splitfile function takes a parameter called downloadedfile. Matching the name of the downloadedfile function. This allows Dagster to infer that the downloadedfile asset has a dependency on splitfile.

Though we have had to make these small changes, hopefully it's clear to see that the code is still relatively standard Python with limited dependencies. This means that it should be easy to migrate existing Python scripts into Dagster or away from it if necessary.

Loading The DAG

Next, we will load the DAG in the Dagit GUI. In a seperate terminal run the following command:

dagit dev -f

After a moment, a new browser will open with your pipeline visible:

Materialising Your Assets

The next step is to actually execute our DAGs and materialise the assets.

From the Dagit UI, we can click the materialise button:

After some time, the stages will go green and we can see that we have had a succesful run.

Next Lesson:

Operations and Assets

In this lesson we will contrast Dagsters operation and asset based pipelines.

0h 15m

Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved