Creating Tables with LOBs using dbt in Azure Synapse: The Columnstore Index Error
Results from a dbt run on the CLI

TL;DR: Explicitly change the index type in your model’s configuration options.

Every dbt adapter is customized to the data warehouse solution they belong to. Not surprisingly, using a particular adapter captures particular errors. For this scenario, we discuss the columnstore index error in the context of dbt-synapse 1.3.2.

the error

Picture a scenario where we have a dbt model with a column named long_string declared as a Large Object (LOB).

model_beta materialized as a view

LOBs are a common data type when storing large binaries or character text data, e.g. URL links. A LOB type is created when casting a column as nvarchar(max) as in the above example.

When we run this model as a view, dbt compiles and executes it without any problems.

model_beta materialized as a table

However, when we attempt to materialize the model as a table, we encounter an error in the CLI:

The statement failed. Column 'long_string' has a data type that cannot participate in a columnstore index

This error will also come up when using materializations of the type incremental.

the root

A quick search on the internet will take you to the root of the problem. Firstly, Azure Synapse and similar solutions create tables using the CLUSTERED COLUMN INDEX as the default index. As per official documentation, “by default, dedicated SQL pool creates a clustered columnstore index when no index options are specified on a table1.”

At the same time, LOBs are not a good option when creating columnstore index index: “LOB data types (the (max) length data types) can’t be the key of an ordered clustered columnstore index2.”

the confusion

According to the dbt-synapse adapter documentation, “All configuration options for the Microsoft SQL Server adapter also apply to this adapter3.”

Meanwhile, the dbt-sqlserver adapter documentation confirms the above default behavior by stating that “tables will, by default, be materialized as a columnstore tables.”4

The documentation also offers a solution by indicating that “this behaviour can be disabled by setting the as_columnstore configuration option to False.”4 Accordingly, we modified the original model to account for this setting.

model_beta with the option as_columnstore disabled

However, the dbt run fails with the same error as before.

the solution

A quick inspection of the actual run script (located in our dbt project’s target folder) reveals that the CLUSTERED COLUMNSTORE INDEX is still being called independently of whether as_columnstore is disabled.

Compiled and run query from model_beta from the target folder

Then, the solution lies in explicitly declaring a different type of index that does not involve the column long_string. For example, we can specify dbt to create the table using a HEAP index.

model_beta with the configuration option index set to HEAP

This change in configuration makes the materialization of the above model possible. It is recommended to check what type of index fits your use case.

Creating Tables with LOBs using dbt in Azure Synapse: The Columnstore Index Error
Older post

A Runner's Tale of Plantar Fasciitis

The story of a rollercoaster ride dealing with foot inflammation and my journey back to running

Newer post

Cool dbt Feature: Project Variables

A brief introduction to project variables and their benefit when modeling in dbt

Creating Tables with LOBs using dbt in Azure Synapse: The Columnstore Index Error