Course Overview
DBT For Data Engineers

Sources and Exposures

Lesson #13

In this lesson we will:

  • Learn about the DBTs source and exposure features.

What Are Sources

As part of our work with DBT we will likely be taking data from original data sources. This data could be extracted from an application, another database or a source such as a spreadsheet. It is then usually uploaded into the database and exposed as tables or views which will form the basis of our DBT pipelines.

The Source feature of DBT allows us to mark relevant tables as external source tables using metadata. This is useful for a few reasons:

  • We need to refer to source tables or views in our DBT pipelines, but we do not want them to ever be created or materialsied by DBT. They are totally outside of the control of DBT;
  • By marking the object as a source, we are being explicit about where it sits in the data lineage pipeline and DAG dependencies;
  • We may wish to test assumptions about our data source prior to starting any transformation. This is subtly different to testing one of our DBT models;
  • Marking a table or view as a source allows us to calculate freshness of the source data for the purposes of building incremental views.

Using Sources

We can specify a table or view as being a source in a YAML configuration file:

  - name: ecommerce_system
      - name: customers
      - name: products

Once created, we can refer to sources using a Jinja function in the same way we use the ref function when adding dependencies on DBT models.

  product_name, price
from {{ source('ecommerce_system', 'products') }}


A DBT transformation DAG has three types of model:

  • Sources e.g. Tables containing our source data. These are marked as Sources as per the above;
  • Intermediate objects - e.g. Tables containing our intermediate calculations and aggregations. These may not be appropriate for people in the business to use;
  • Destinations e.g. Tables containing the data we actually want our user community to use which meet our desired standards for accuracy and completeness.

Where the Sources feature described above allows us to mark data sources, Exposures allow us to use metadata to represent the tables at the end of our piplines.

We can add metadata to our Exposures such as how it is used (a dashboard, report or notebook), and an email address for the owner of the downstream consumer. This is a simple feature, but can massively support the day-to-day workings of a data team who need to co-ordinate with their data consumers when making changes and ensuring they don't break downstream consumers.

Controlling DBT Runs For Sources and Exposures

During the development process, it is sometimes useful to only run the models dependent on a particular source, or all of the models which feed into some exposure. This can be done with the --tbc flag.

All transformations downstream from a source can be executed using select criteria:

dbt run -s source:product_sales

And all exposures upstream of a given source can be executed in the following way:

dbt run -s +exposure:product_sales_by_category

This makes the development process much more efficient, and could also be used as part of automation where we only need to update data for given exposures periodically.

Next Lesson:

Documenting Your Models

In this lesson we will learn about DBTs features for automatically generating documentation.

0h 15m

Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved