In this lesson we will learn about DBTs testing features which can be used to confirm the accuracy and correctness of any data transformations.
This lab is also available as a video:
DBT includes features for automatically testing the correctness of our transformations each time they are executed.
By incorporating testing into the transformation process, we can build confidence that our transformations are operating as we expect, for instance:
- That we get the expected number of rows in the output;
- That columns are unique, not null, greater than zero in line with our expectations;
- That all data meets our business rules (e.g. all line items should not have a total value greater than the invoice value);
This helps to build quality into the transformations, catching human errors and data errors as early as possible in the data pipeline where problems are easier to resolve before bad data flows downstream or is delivered to our business users.
Automated testing in this way is a very popular technique amougst software engineers, who nowadays have a culture of automatically unit testing individual pieces of logic, and integration testing their end to end solution.
Though there have been various attempts to add testing to data, DBT is the first tool which integrates it so well with the actual transformation code, such that automated testing and practices such as test driven development become viable.
DBT tests are stored in the test directory of your DBT project.
A DBT test is simply a SQL query which should return zero rows if the test passes. If the test returns any rows, those rows are considered to be the failing records which violet your test assertion and should be investigated.
Having defined our tests, we can then execute them on an ad-hoc basis like so:
We can also limit the test to one particular model during the development loop:
dbt test --select sales_by_store
In this lesson we introduced the testing features of DBT, and explained how these can be used to improve the quality of our transformations by capturing either human or data errors early in the process.
We created a test and demonstrated features such as limiting to an individual test case, which can be useful during an iterative development cycle.