When was the last time you looked at a data warehouse for the first time? Do you remember that feeling of frustration, not knowing what tables orders_final_v1 contained? How about the difference between user_uuid and user_id? Any data practitioner can relate to such feelings.
Thankfully for us, dbt (Data Build Tool) has made documenting data warehouses a much easier task. All we need to do is include the documentation of our tables and columns in the schema YAML file. Then, all the information gets compiled into a neat HTML file.
Documenting your Data Warehouse (DWH)
When working with data models in dbt, it is important to document your work to ensure that others can understand what you have built and how to use it.
Let’s begin with the models/schema.yml file. This file will contain all the information about your tables. For example, consider the following documentation for the table dim_album:
After saving the above YAML, execute dbt docs generate
. Finally, the command dbt docs serve
will open your default browser and render your documentation.
Using a docs block
We can take documentation up a notch by using docs blocks. This way, we leverage a markdown file with Jinja scripts to generate an even more comprehensive document.
Let’s begin by creating the file models/docs.md. Within this file, we place our markdown documentation, enclosed by the Jinja functions.
We then reference this models/docs.md markdown file using Jinja within the schema YAML:
This approach allows you to create documentation with all the flexibility expected from a markdown file.
What about Tests?
Whenever a generic test has been called on a column within the schema YAML file, it will appear in the documentation under the corresponding column.
Similarly to generic tests, singular tests are custom pieces of SQL that are executed against our dbt models. They are stored within the tests folder and get executed alongside generic tests.
One important aspect of documentation is describing the tests that you have written to validate your data models. At the moment, dbt will create a documenation page for singular tests. It will show you the SQL behind the singular test, as well as the model that is being tested under dependencies.
However, dbt does not yet allow you to fill in the description box in a singular test’s documentation. A workaround is to describe the test and reference its documentation under the model’s documentation page. Simply use the test’s page URL suffix in the docs.md file.
Voila! The result is your model’s documentation page, which lists a model’s generic and singular tests. It is always useful to append a query that helps the reader to identify and correct the error raised by the test. And last but not least, do not forget to use clear and concise language whenever you can. Happy querying.