Optimizing Snowflake Costs with dbt Query Tags

If you’re using Snowflake, you know that things can get expensive very quickly if you don’t watch out. Whether it is turning down the default warehouse timeout settings from 10 to 1 minute, limiting the autoscaling, or making sure that your undersized warehouse times out when trying to process more data than it can handle instead of continuing throughout the weekend… Out of the box it can be hard to see the full picture of what is going on in your Snowflake account, especially when you have dbt running alongside other queries. The following trick will help you improve the visibility on your Snowflake costs from dbt. You will be able to tell

How credit usage of models changes over time
Which models are taking the most time to run
Which dbt runs are taking most of your time and credits

Ingredient list

First of all we need to get all of our ingredients together. Without the ingredients we can’t cook up some of those sweet monitoring dashboards. We need:

A dbt macro to adjust our Snowflake query tag
A dbt source for the Snowflake query history
A dbt model for the Snowflake query history parsing the query tag

With those ingredients we should have everything ready to start making a nice dashboard for better cost insights for running dbt on Snowflake. Snowflake dbt Cost Dashboard from Query Tags

Setting Up dbt Query Tags for Cost Monitoring

The Snowflake query tag is an arbitrary string you can set whenever you run a query. It could be your user ID or any kind of metadata you like. dbt gives us the option to overwrite the default value by setting the set_query_tag() macro. This allows us to add in variables that are available to us when dbt runs our models and wrap them up in a JSON string for later usage. The macro itself then, is actually quite simple. The trick is knowing what you want to add in.

  {%- macro set_query_tag() -%}
  {% set original_query_tag = get_current_query_tag() %}
  {% set new_query_tag = {
        "model" : model.name,
        "resource_type" : model.resource_type,
        "invocation_id" : invocation_id,
        "user_tag" : original_query_tag
    }
    %}
  {% if model %}
    {% do run_query("alter session set query_tag = '{}'".format(new_query_tag | tojson)) %}
    {{ return(original_query_tag)}}
  {% endif %}
  {{ return(none)}}
{%- endmacro -%}

Adding this to your macros folder will automatically tag all dbt queries with essential metadata for cost tracking. You can verify it’s working by checking the Snowflake query history panel after running a dbt model.

Important notes for data engineers:

The macro only applies to model runs, not to dbt commands like dbt deps or dbt compile
If you have existing query tags set in your dbt profile, they’ll be preserved in the user_tag field
The invocation_id helps group all queries from a single dbt run together

Accessing Snowflake Account Usage Data with dbt

The nice thing about Snowflake being SQL first in everything is that also our Snowflake query history is just a table in a system schema called snowflake.account_usage. We can add it to dbt as a source and in the process also add in the warehouse history for easy monitoring of credit usage. Credits are unfortunately not measured directly on the query level, but on the warehouse level.

sources:
  - name: snowflake_usage
    schema: account_usage
    database: snowflake
    tables:
      - name: warehouse_metering_history
      - name: query_history

With these sources we can create a dbt model for the Snowflake query history.

-- models/staging/stg_snowflake__query_history.sql

select
	query_id,
	query_hash,
	query_text,
	try_parse_json(query_tag):model::string as dbt_model_name,
	try_parse_json(query_tag):resource_type::string as dbt_resource_type, -- model, test, etc.
	try_parse_json(query_tag):invocation_id::string as dbt_invocation_id,
	try_parse_json(query_tag):user_tag::string as dbt_user_tag,
	user_name,
	rows_produced,
	warehouse_id,
	warehouse_name,
	warehouse_size,
	case 
		when warehouse_size = 'X-Small' then 0.0003
		when warehouse_size = 'Small' then 0.0006
		when warehouse_size = 'Medium' then 0.0011
		when warehouse_size = 'Large' then 0.0022
		when warehouse_size = 'X-Large' then 0.0044
		when warehouse_size = '2X-Large' then 0.0089
		when warehouse_size = '3X-Large' then 0.0178
		when warehouse_size = '4X-Large' then 0.0356
	end as credits_per_second,
	start_time,
	end_time,
	div0(total_elapsed_time, 1000) as total_elapsed_time_sec,
	bytes_scanned

from {{ source('snowflake_usage', 'query_history') }}

There are much more columns to select if you want, like database and schema IDs or partitions scanned, but this serves our purpose for now. With just this model we already get a wealth of insights. Of course we do not want all our queries right now, let’s have a look at our production runs only.

-- models/marts/fct_dbt_prod_query_history.sql
select 
	query_id,
	dbt_model_name,
	dbt_resource_type,
	dbt_invocation_id,
	start_time,
	total_elapsed_time_sec,
	credits_per_second * total_elapsed_time_sec as query_credit_usage,
    credits_per_second * total_elapsed_time_sec * 3.30  as query_eu_enterprise_default_cost_eur,
	bytes_scanned
from {{ ref('stg_snowflake__query_history') }}
where user_name = 'MY_DBT_PRODUCTION_USER'

Calculating Snowflake Credit Costs and Performance

Using Snowflake’s warehouse guide we can determine the credit usage of our queries. Your organization might have a better deal with Snowflake, but using the default rates you can also get an idea of the costs for your query. Bear in mind though that your query is not the only cost. The Snowflake warehouse has a minimum running time of 60 seconds, so if you run lots of small queries, you might have a lot of idle time for your warehouse. To get an understanding of the idle rate for your warehouse you can use the warehouse_metering_history table to calculate compute credits that were not used for queries.

select
    warehouse_id,
    warehouse_name,
    sum(credits_used_compute) total_credits_used_compute,
    sum(credits_used_compute) - sum(credits_attributed_compute_queries) as credits_used_idle,
    sum(credits_used_compute - credits_attributed_compute_queries) / sum(credits_used_compute) as avg_idle_credits_rate
from {{ source('snowflake_usage', 'warehouse_metering_history') }}
group by 1, 2

Finally we can get a pretty accurate calculation of our query cost:

credits_per_second * total_elapsed_time_sec * 3.90 * (1 + avg_idle_credits_rate)

Optimizing Your Snowflake Data Warehouse Costs

With this setup, you now have comprehensive visibility into your dbt model performance and Snowflake costs. Here’s what you can do next:

Immediate Actions:

Identify your top 10 most expensive models by credit usage
Look for models with unusually high bytes_scanned relative to their output
Check for models running during off-peak hours that could be scheduled differently

Long-term Optimization:

Set up automated alerts when daily credit usage exceeds thresholds
Create dashboards showing cost trends over time by dbt project or model type
Use the invocation_id to analyze full dbt run performance and identify bottlenecks

Advanced Analytics:

Compare weekend vs weekday usage patterns
Analyze the correlation between model complexity (lines of code) and credit usage
Track how cost optimization efforts impact your overall Snowflake bill
Use a tool like Elementary to further enhance your dbt model tracking.

This approach to snowflake cost optimization through dbt query tags gives data engineering teams the granular visibility needed to make informed decisions about warehouse sizing, model optimization, and resource allocation.