Lesson Overview

In this lesson we will learn about DBTs seed data feature, which allows us to reliably populate our database with static data for use as part of DBT transformations.

This lesson is also available as a video:

Video: DBT - Seed Data

What Is DBT Seed Data?

Most of our work with DBT will involve taking some source data ingested from some application, and transforming it into our views and tables in our desired destination formats and structured.

As part of this process, we may require external static data or lookup tables to enable the transformation. For instance:

  • To map short country codes to country names (e.g. US > United States);
  • To map a list of product codes to longer descriptions (e.g. PZ04 > Pepperoni Pizza)
  • Something specific to the business domain such as a hard coded list of staff or office locations.

DBTs Seed Data feature allows us to store this data directly within the DBT project formatted as a CSV file where it can be properly source controlled, and then bought into the Data Warehouse as part of the DBT run.

The advantage of managing seed data in this way is that it becomes scripted, and can be bought into source control as part of the DBT project. When we have this seed static data being loaded into the warehouse in a reliable way, we can then build our transformations, safe in the knowledge that the static data and lookup tables we need will be present prior the transformation running.

Correct Use

This seed data feature is not designed as a production quality data loading tool to bring large data extracts into your warehouse. Rather, it is there to support ad-hoc loads of small static data tables required to support and enhance the transformations which you write.

Using Seed Data

Seed data is stored within the DBT project as a CSV file inside the data folder of your project. We can create an example like so:

cd data
touch country_codes.csv

We can then edit the file:

nano country_codes.csv

And paste the following CSV formatted data:

Code, Country_Name
US, United States
UK, United Kingdom
FRA, France
AUS, Austria
CH, China 

After saving the file, we can now import the seed data file into the database with the following command:

dbt seed

If we log back into Postgres, we will see that we have a table created which is named was based on the CSV file.

postgres -TBC
use pizzastore_analytics
select * from country_codes 

Outputs:

TBC

Note that the column types are automically inferred from the data when seeding data. If you wish to specify a different type for the columns, this can be overridden in DBT YAML files.

Using Seed Data As Part Of Transformations

Seed Data

Summary

In this lesson we looked into DBTs seed data feature, which can be used to populate your database with simple static lists used for scenarios such as data mapping and simple lookup codes.

We discussed the advantages of doing this such as bringing seed data into source control.

We highlighted that dbt seed is not an appropriate tool for doing bulk data loads.

This Lesson Requires A Free Membership

Sign Up

Already A Member? Log In

Prev LessonNext Lesson

© 2022 Timeflow Academy.