Course Overview
Orchestrating Data Platforms With Dagster

Setting Up Dagster For Local Development

Lesson #3

In this lesson we will:

  • In this lesson we will explain how to setup Dagster on your local laptop.

Components

There are three components to a production Dagster deployment:

  • Dagster Instance - This is the main Dagster server which contains all of the configuration about your jobs. It will connect to a database.
  • Dagster Daemon - This is a background procesds which is
  • Dagit - This is a GUI application which interactis
  • Executors - Executors are responsible for running your task code. A productiond deployment may consist of many executors, perhaps running in a Kubernetes cluster.

Installing Dagster Locally For Development

We will begin by running Dagster on our local laptop for development purposes.

Dagster requires Python 3, and at at the time of writing has been tested on versions 3.7 - 3.10.

The Dagster platform is written in Python and distributed as a pip module. This makes it very easy to deploy.

pip install dagster

It is also reccomended to install Dagit, Dagsters GUI tool.

pip install dagit 

Assuming the two components installed succesfully, we can then run Dagster in dev mode to start a single node cluster:

dagster dev

Productionising The Install

When we move to a production deployment, there a are a number of steps we will likely need to take.

Metadata Storage - By default, Dagster stores it's data on the local disk where the instance is started. A more robust solution is to point your Dagster instance at a Postgres database to store configuration.

The DAGSTER_HOME environment variable - By default, configuration is

dagster.yaml file - When working in development, we tend to configure Dagster with command line flags and environment variables. The dagster.yaml file puts these parameters in a configuration file.

Next Lesson:
03

Writing Your First DAG

In this lesson we will write and execute our first Dagster jobs.

0h 15m



Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved