Optimizing Snowflake Costs with dbt Query Tags
If you’re using Snowflake, you know that things can get expensive very quickly if you don’t watch out. Whether it is turning down the default warehouse timeout settings from 10 to 1 minute, limiting the autoscaling, or making sure that your undersized warehouse times out when trying to process more data than it can handle instead of continuing throughout the weekend… Out of the box it can be hard to see the full picture of what is going on in your Snowflake account, especially when you have dbt running alongside other queries. The following trick will help you improve the visibility on your Snowflake costs from dbt. You will be able to tell
- How credit usage of models changes over time
- Which models are taking the most time to run
- Which dbt runs are taking most of your time and credits
Ingredient list
First of all we need to get all of our ingredients together. Without the ingredients we can’t cook up some of those sweet monitoring dashboards. We need:
- A dbt macro to adjust our Snowflake query tag
- A dbt source for the Snowflake query history
- A dbt model for the Snowflake query history parsing the query tag
With those ingredients we should have everything ready to start making a nice dashboard for better cost insights for running dbt on Snowflake.
Setting Up dbt Query Tags for Cost Monitoring
The Snowflake query tag is an arbitrary string you can set whenever you run a query. It could be your user ID or any kind of metadata you like. dbt gives us the option to overwrite the default value by setting the set_query_tag()
macro. This allows us to add in variables that are available to us when dbt runs our models and wrap them up in a JSON string for later usage. The macro itself then, is actually quite simple. The trick is knowing what you want to add in.
{%- macro set_query_tag() -%}
{% set original_query_tag = get_current_query_tag() %}
{% set new_query_tag = {
"model" : model.name,
"resource_type" : model.resource_type,
"invocation_id" : invocation_id,
"user_tag" : original_query_tag
}
%}
{% if model %}
{% do run_query("alter session set query_tag = '{}'".format(new_query_tag | tojson)) %}
{{ return(original_query_tag)}}
{% endif %}
{{ return(none)}}
{%- endmacro -%}
Adding this to your macros
folder will automatically tag all dbt queries with essential metadata for cost tracking. You can verify it’s working by checking the Snowflake query history panel after running a dbt model.
Important notes for data engineers:
- The macro only applies to model runs, not to dbt commands like
dbt deps
ordbt compile
- If you have existing query tags set in your dbt profile, they’ll be preserved in the
user_tag
field - The
invocation_id
helps group all queries from a single dbt run together
Accessing Snowflake Account Usage Data with dbt
The nice thing about Snowflake being SQL first in everything is that also our Snowflake query history is just a table in a system schema called snowflake.account_usage
. We can add it to dbt as a source and in the process also add in the warehouse history for easy monitoring of credit usage. Credits are unfortunately not measured directly on the query level, but on the warehouse level.
sources:
- name: snowflake_usage
schema: account_usage
database: snowflake
tables:
- name: warehouse_metering_history
- name: query_history
With these sources we can create a dbt model for the Snowflake query history.
-- models/staging/stg_snowflake__query_history.sql
select
query_id,
query_hash,
query_text,
try_parse_json(query_tag):model::string as dbt_model_name,
try_parse_json(query_tag):resource_type::string as dbt_resource_type, -- model, test, etc.
try_parse_json(query_tag):invocation_id::string as dbt_invocation_id,
try_parse_json(query_tag):user_tag::string as dbt_user_tag,
user_name,
rows_produced,
warehouse_id,
warehouse_name,
warehouse_size,
case
when warehouse_size = 'X-Small' then 0.0003
when warehouse_size = 'Small' then 0.0006
when warehouse_size = 'Medium' then 0.0011
when warehouse_size = 'Large' then 0.0022
when warehouse_size = 'X-Large' then 0.0044
when warehouse_size = '2X-Large' then 0.0089
when warehouse_size = '3X-Large' then 0.0178
when warehouse_size = '4X-Large' then 0.0356
end as credits_per_second,
start_time,
end_time,
div0(total_elapsed_time, 1000) as total_elapsed_time_sec,
bytes_scanned
from {{ source('snowflake_usage', 'query_history') }}
There are much more columns to select if you want, like database and schema IDs or partitions scanned, but this serves our purpose for now. With just this model we already get a wealth of insights. Of course we do not want all our queries right now, let’s have a look at our production runs only.
-- models/marts/fct_dbt_prod_query_history.sql
select
query_id,
dbt_model_name,
dbt_resource_type,
dbt_invocation_id,
start_time,
total_elapsed_time_sec,
credits_per_second * total_elapsed_time_sec as query_credit_usage,
credits_per_second * total_elapsed_time_sec * 3.30 as query_eu_enterprise_default_cost_eur,
bytes_scanned
from {{ ref('stg_snowflake__query_history') }}
where user_name = 'MY_DBT_PRODUCTION_USER'
Calculating Snowflake Credit Costs and Performance
Using Snowflake’s warehouse guide we can determine the credit usage of our queries. Your organization might have a better deal with Snowflake, but using the default rates you can also get an idea of the costs for your query. Bear in mind though that your query is not the only cost. The Snowflake warehouse has a minimum running time of 60 seconds, so if you run lots of small queries, you might have a lot of idle time for your warehouse. To get an understanding of the idle rate for your warehouse you can use the warehouse_metering_history
table to calculate compute credits that were not used for queries.
select
warehouse_id,
warehouse_name,
sum(credits_used_compute) total_credits_used_compute,
sum(credits_used_compute) - sum(credits_attributed_compute_queries) as credits_used_idle,
sum(credits_used_compute - credits_attributed_compute_queries) / sum(credits_used_compute) as avg_idle_credits_rate
from {{ source('snowflake_usage', 'warehouse_metering_history') }}
group by 1, 2
Finally we can get a pretty accurate calculation of our query cost:
credits_per_second * total_elapsed_time_sec * 3.90 * (1 + avg_idle_credits_rate)
Optimizing Your Snowflake Data Warehouse Costs
With this setup, you now have comprehensive visibility into your dbt model performance and Snowflake costs. Here’s what you can do next:
Immediate Actions:
- Identify your top 10 most expensive models by credit usage
- Look for models with unusually high
bytes_scanned
relative to their output - Check for models running during off-peak hours that could be scheduled differently
Long-term Optimization:
- Set up automated alerts when daily credit usage exceeds thresholds
- Create dashboards showing cost trends over time by dbt project or model type
- Use the
invocation_id
to analyze full dbt run performance and identify bottlenecks
Advanced Analytics:
- Compare weekend vs weekday usage patterns
- Analyze the correlation between model complexity (lines of code) and credit usage
- Track how cost optimization efforts impact your overall Snowflake bill
- Use a tool like Elementary to further enhance your dbt model tracking.
This approach to snowflake cost optimization through dbt query tags gives data engineering teams the granular visibility needed to make informed decisions about warehouse sizing, model optimization, and resource allocation.