Effortless Documentation with dbt: Streamlining Data Warehouse Documentation
Photo by Nana Smirnova on Unsplash

When was the last time you looked at a data warehouse for the first time? Do you remember that feeling of frustration, not knowing what tables orders_final_v1 contained? How about the difference between user_uuid and user_id? Any data practitioner can relate to such feelings.

Thankfully for us, dbt (Data Build Tool) has made documenting data warehouses a much easier task. All we need to do is include the documentation of our tables and columns in the schema YAML file. Then, all the information gets compiled into a neat HTML file.

Documenting your Data Warehouse (DWH)

When working with data models in dbt, it is important to document your work to ensure that others can understand what you have built and how to use it.

Let’s begin with the models/schema.yml file. This file will contain all the information about your tables. For example, consider the following documentation for the table dim_album:

Pro tip: Use a folded block scalar, or ">", to fold new lines into spaces. This way, the YAML file is readable to the next person who will edit it


After saving the above YAML, execute dbt docs generate. Finally, the command dbt docs serve will open your default browser and render your documentation.

2023-07-10-effortless-documentation-dbt-img01

Rendered version of dim_album documentation


Using a docs block

We can take documentation up a notch by using docs blocks. This way, we leverage a markdown file with Jinja scripts to generate an even more comprehensive document.

Let’s begin by creating the file models/docs.md. Within this file, we place our markdown documentation, enclosed by the Jinja functions.

Example markdown file to supplement dim_album's documentation. Notice that, the holder's reference name is dictated by {% docs dim_album %}


We then reference this models/docs.md markdown file using Jinja within the schema YAML:

Replacing the model's description by the Jinja holder "{{ doc('dim_album') }}"


This approach allows you to create documentation with all the flexibility expected from a markdown file.

2023-07-10-effortless-documentation-dbt-img02

Render version of dim_album's documentation leveraging a docs block


What about Tests?

Whenever a generic test has been called on a column within the schema YAML file, it will appear in the documentation under the corresponding column.

2023-07-10-effortless-documentation-dbt-img03

The generic test for uniqueness has been called on the column album_id. It is configured in the YAML schema file and rendered by dbt docs serve


Similarly to generic tests, singular tests are custom pieces of SQL that are executed against our dbt models. They are stored within the tests folder and get executed alongside generic tests.

One important aspect of documentation is describing the tests that you have written to validate your data models. At the moment, dbt will create a documenation page for singular tests. It will show you the SQL behind the singular test, as well as the model that is being tested under dependencies.

However, dbt does not yet allow you to fill in the description box in a singular test’s documentation. A workaround is to describe the test and reference its documentation under the model’s documentation page. Simply use the test’s page URL suffix in the docs.md file.

Including description for older_album_test and a reference link to its own documentation page


Voila! The result is your model’s documentation page, which lists a model’s generic and singular tests. It is always useful to append a query that helps the reader to identify and correct the error raised by the test. And last but not least, do not forget to use clear and concise language whenever you can. Happy querying.

Effortless Documentation with dbt: Streamlining Data Warehouse Documentation
Older post

My Airflow Journey

Starting from scratch is impractical, Google Cloud Platform's Composer is expensive, and debugging Airflow on Windows is challenging

Newer post

Surf, Work, and Explore: A Web Scraping Exercise

I scrape data using python to find new countries with surf spots where you can work remotely from, ceteris paribus

Effortless Documentation with dbt: Streamlining Data Warehouse Documentation