Lesson Overview

In this lesson we will:

  • Learn about the DBTs source and exposure features.

What Are Sources

As part of our work with DBT we will likely be taking data from original data sources. This data could be extracted from an application, another database or a source such as a spreadsheet. It is then usually uploaded into the database and exposed as tables or views which will form the basis of our DBT pipelines.

The Source feature of DBT allows us to mark relevant tables as external source tables using metadata. This is useful for a few reasons:

  • We need to refer to source tables or views in our DBT pipelines, but we do not want them to ever be created or materialsied by DBT. They are totally outside of the control of DBT;
  • By marking the object as a source, we are being explicit about where it sits in the data lineage pipeline and DAG dependencies;
  • We may wish to test assumptions about our data source prior to starting any transformation. This is subtly different to testing one of our DBT models;
  • Marking a table or view as a source allows us to calculate freshness of the source data for the purposes of building incremental views.
Video: DBT - Introduction To Sources & Exposures

Using Sources

We can specify a table or view as being a source in a YAML configuration file:

sources:
  - name: ecommerce_system
    tables:
      - name: customers
      - name: products

Once created, we can refer to sources using a Jinja function in the same way we use the ref function when adding dependencies on DBT models.

select
  product_name, price
from {{ source('ecommerce_system', 'products') }}

Exposures

A DBT transformation DAG has three types of model:

  • Sources e.g. Tables containing our source data. These are marked as Sources as per the above;
  • Intermediate objects - e.g. Tables containing our intermediate calculations and aggregations. These may not be appropriate for people in the business to use;
  • Destinations e.g. Tables containing the data we actually want our user community to use which meet our desired standards for accuracy and completeness.

Where the Sources feature described above allows us to mark data sources, Exposures allow us to use metadata to represent the tables at the end of our piplines.

We can add metadata to our Exposures such as how it is used (a dashboard, report or notebook), and an email address for the owner of the downstream consumer. This is a simple feature, but can massively support the day-to-day workings of a data team who need to co-ordinate with their data consumers when making changes and ensuring they don't break downstream consumers.

Controlling DBT Runs For Sources and Exposures

During the development process, it is sometimes useful to only run the models dependent on a particular source, or all of the models which feed into some exposure. This can be done with the --tbc flag.

All transformations downstream from a source can be executed using select criteria:

dbt run -s source:product_sales

And all exposures upstream of a given source can be executed in the following way:

dbt run -s +exposure:product_sales_by_category

This makes the development process much more efficient, and could also be used as part of automation where we only need to update data for given exposures periodically.

Summary

In this lesson we introduced the DBT concept of Sources and Exposures.

We explained how they add metadata to data which feeds into your DBT pipelines and the datasets exposed from them.

We highlighted how we can execute DBT runs and tests only for models upstream of a particular source, or downstream of a particular exposure. This can be useful for improving development cycle times rather than executing the entire transformation run.

Next Lesson

Description of next lesson here

Hands-On Training For The Modern Data Stack

Timeflow Academy is an online, hands-on platform for learning about Data Engineering and Modern Cloud-Native Database management using tools such as DBT, Snowflake, Kafka, Spark and Airflow...

Sign Up

Already A Member? Log In

Next Lesson:

Documenting Your Models

Prev Lesson:

Ethemeral Views

© 2022 Timeflow Academy.