Course Overview
DBT For Data Engineers

Testing With DBT

Lesson #10

In this lesson we will:

  • Learn about DBTs testing features which can be used to confirm the accuracy and correctness of any data transformations;

Testing With DBT

DBT includes features for automatically testing the correctness of our transformations each time they are executed.

By incorporating testing into the transformation process, we can build confidence that our transformations are operating as we expect, for instance:

  • That we get the expected number of rows in the output;
  • That columns are unique, not null, greater than zero in line with our expectations;
  • That all data meets our business rules (e.g. all line items should not have a total value greater than the invoice value);

This helps to build quality into the transformations, catching human errors and data errors as early as possible in the data pipeline where problems are easier to resolve before bad data flows downstream or is delivered to our business users.

Automated testing in this way is a very popular technique amougst software engineers, who nowadays have a culture of automatically unit testing individual pieces of logic, and integration testing their end to end solution.

Though there have been various attempts to add testing to data, DBT is the first tool which integrates it so well with the actual transformation code, such that automated testing and practices such as test driven development become viable.

Writing Your First Test

DBT tests are stored in the test directory of your DBT project.

A DBT test is simply a SQL query which should return zero rows if the test passes. If the test returns any rows, those rows are considered to be the failing records which violet your test assertion and should be investigated.

FOOBAR BAZ

Having defined our tests, we can then execute them on an ad-hoc basis like so:

dbt test

We can also limit the test to one particular model during the development loop:

dbt test --select sales_by_store
Next Lesson:
10

Incremental Views

In this lesson we will learn about DBTs incremental updates and incremental views.

0h 15m



Continuous Delivery For Data Engineers

This site has been developed by the team behind Timeflow, an Open Source CI/CD platform designed for Data Engineers who use dbt as part of the Modern Data Stack. Our platform helps Data Engineers improve the quality, reliability and speed of their data transformation pipelines.

Join our mailing list for our latest insights on Data Engineering:

Timeflow Academy is the leading online, hands-on platform for learning about Data Engineering using the Modern Data Stack. Bought to you by Timeflow CI

© 2023 Timeflow Academy. All rights reserved