TL;DR: Explicitly change the index type in your model’s configuration options.
Every dbt adapter is customized to the data warehouse solution they belong to. Not surprisingly, using a particular adapter captures particular errors. For this scenario, we discuss the columnstore index error in the context of dbt-synapse 1.3.2
.
the error
Picture a scenario where we have a dbt model with a column named long_string
declared as a Large Object (LOB).
model_beta
materialized as a view
LOBs are a common data type when storing large binaries or character text data, e.g. URL links. A LOB type is created when casting a column as nvarchar(max)
as in the above example.
When we run this model as a view
, dbt compiles and executes it without any problems.
model_beta
materialized as a table
However, when we attempt to materialize the model as a table
, we encounter an error in the CLI:
The statement failed. Column 'long_string' has a data type that cannot participate in a columnstore index
This error will also come up when using materializations of the type incremental
.
the root
A quick search on the internet will take you to the root of the problem. Firstly, Azure Synapse and similar solutions create tables using the CLUSTERED COLUMN INDEX
as the default index. As per official documentation, “by default, dedicated SQL pool creates a clustered columnstore index when no index options are specified on a table1.”
At the same time, LOBs are not a good option when creating columnstore index index: “LOB data types (the (max) length data types) can’t be the key of an ordered clustered columnstore index2.”
the confusion
According to the dbt-synapse
adapter documentation, “All configuration options for the Microsoft SQL Server adapter also apply to this adapter3.”
Meanwhile, the dbt-sqlserver
adapter documentation confirms the above default behavior by stating that “tables will, by default, be materialized as a columnstore tables.”4
The documentation also offers a solution by indicating that “this behaviour can be disabled by setting the as_columnstore
configuration option to False
.”4 Accordingly, we modified the original model to account for this setting.
model_beta
with the option as_columnstore
disabled However, the dbt run
fails with the same error as before.
the solution
A quick inspection of the actual run script (located in our dbt project’s target folder) reveals that the CLUSTERED COLUMNSTORE INDEX
is still being called independently of whether as_columnstore
is disabled.
model_beta
from the target folderThen, the solution lies in explicitly declaring a different type of index that does not involve the column long_string
. For example, we can specify dbt to create the table using a HEAP
index.
model_beta
with the configuration option index set to HEAP
This change in configuration makes the materialization of the above model possible. It is recommended to check what type of index fits your use case.