Beyond Tables and Views: Building a Custom dbt Materialization to Deploy Streamlit Apps on Snowflake
Deploying Streamlit apps with dbt on Snowflake using a custom dbt materialization: a practical tutorial and exploration
If you want to jump straight into the code, you can find it on Github.
What if your dbt model could deploy a dashboard instead of creating a table? If you’ve worked with dbt for any amount of time, you’re familiar with the usual suspects: table
, view
, and incremental
materializations. These are the bread and butter of data transformation—turning your SQL into database objects that store or reference your transformed data. Tables persist your results, views create virtual representations, and incremental models efficiently update existing datasets.
But here’s where it gets interesting: I was working on a source freshness dashboard in Snowflake using Streamlit, constantly switching between my dbt transformations and my visualization code. Then I thought: Why am I managing these separately? Snowflake defines everything in SQL, my dbt models already know everything about my data structure, my business logic and source freshness. What if the same workflow that deploys my transformed data could also deploy the applications that visualize it? And that’s not the only use case I saw. These are just a few cases where I think dbt can benefit from interactive visualization with Streamlit:
- Showing the costs of your models
- Offering better documentation of your data marts
- Having a one‑stop shop for documentation, dashboards, and data requests
- Keeping an eye on how your dbt pipelines are running (e.g., using the Elementary dbt package)
dbt custom materializations
If you’re new to custom materializations, here’s the 60‑second version: a dbt materialization is just a recipe for what happens when a model runs. Under the hood, most follow a predictable six‑step pattern:
- prepare (resolve references and configs)
- run pre‑hooks
- execute the core work
- run post‑hooks
- clean up
- update the relation cache
The key insight is that the “core work” can be anything your adapter can talk to: not only creating tables or views, but also writing files, managing external objects, or even orchestrating application artifacts. That’s how projects like dbt‑duckdb external materializations implement files, and it’s the same idea we’ll bend toward deploying a Streamlit app.
Note: This post targets Snowflake Streamlit apps specifically. For background, see Snowflake’s docs on About Streamlit in Snowflake.
Streamlit materialization architecture
At a high level, the materialization treats your model as an app deployment:
- use a dbt YAML configuration for settings
- upload app files to a Snowflake stage
- create/update a Streamlit app that points at those files
We’ll go deeper on each piece, but here’s a quick taste.
# models/data_discovery_hub.py — a dbt Python model
import pandas as pd
def model(dbt, session):
# The materialization does the real work (upload + app create)
# We return an empty DataFrame to satisfy dbt's Python model contract
return pd.DataFrame()
This conforms to the dbt Python model contract (return a pandas DataFrame). The custom streamlit
materialization handles the file uploads and Streamlit app creation.
# models/_models.yml — minimal config
models:
- name: source_freshness_dashboard
config:
materialized: streamlit
# keys are read directly via config.get(<key>) in the materialization
app_name: data_discovery_hub
stage: app_stage
pages:
- app_data/pages/other_documentation_page.py
include_files:
- app_data/utils.py
# optional settings
include_manifest: true # will append target/manifest.json automatically
# warehouse: my_wh
Stages and core components
If you’re not familiar with stages on Snowflake, they’re internal file stores you can access with SQL. In this case they act as the app’s file store inside Snowflake. The materialization (a Jinja macro) creates a stage if needed, uploads your files with PUT
, and then creates a Streamlit app that points at that stage path:
-- create or alter the stage (id built from database.schema.stage)
CREATE OR ALTER STAGE {{ database }}.{{ schema }}.{{ config.get('stage', 'streamlit_apps') }}
DIRECTORY = (ENABLE = TRUE)
COMMENT = 'Stage for Streamlit application';
-- upload the main app file
PUT 'file://{{ model.original_file_path }}' @{{ database }}.{{ schema }}.{{ config.get('stage', 'streamlit_apps') }}/{{ config.get('app_name', model['alias']) }}
OVERWRITE = TRUE AUTO_COMPRESS = FALSE;
-- create or replace the Streamlit app from stage
CREATE OR REPLACE STREAMLIT {{ database }}.{{ schema }}.{{ model['alias'] }}
FROM @{{ database }}.{{ schema }}.{{ config.get('stage', 'streamlit_apps') }}/{{ config.get('app_name', model['alias']) }}
MAIN_FILE = '{{ model.original_file_path.split('/') | last }}'
QUERY_WAREHOUSE = '{{ config.get('warehouse', target.warehouse) }}';
Now instead of uploading just the one model file, you often want additional files for convenience too: utilities, images, pages, etc. You can specify these in the configuration, but that also means we have to take care of it in the materialization.
{# iterate over the extra files #}
{% for file in config.get('include_files', []) %}
{% set dest = ([stage_name, app_name] + file.split('/')[:-1]) | join('/') %}
{% if 'environment.yml' in file %}
{% set dest = [stage_name, app_name, 'environment.yml'] | join('/') %}
{% endif %}
{{ log('Uploading file: ' ~ file ~ ' to: ' ~ database ~ '.' ~ schema ~'.' ~ dest, info=True) }}
PUT 'file://{{ file }}' @{{ dest }}
OVERWRITE = TRUE
AUTO_COMPRESS = FALSE;
{% endfor %}
{# put the pages in a specific 'pages' folder for streamlit #}
{% for file in config.get('pages', []) %}
{% set dest = [stage_name, app_name, 'pages'] | join('/') %}
{{ log('Uploading page: ' ~ file ~ ' to: ' ~ database ~ '.' ~ schema ~'.' ~ dest, info=True) }}
PUT 'file://{{ file }}' @{{ dest }}
OVERWRITE = TRUE
AUTO_COMPRESS = FALSE;
{% endfor %}
Two small but useful implementation notes:
- On full-refresh, the macro drops the Streamlit app and the stage before recreating them. This ensures a clean deploy when you need it.
- By default, the macro appends your dbt
target/manifest.json
to the upload list (include_manifest: true
). That makes it easy for your app to read dbt metadata at runtime.
Seeing it in action: a Data Discovery Hub
To make this concrete, I packaged a small Streamlit app (you can find the example code on Github) which I call the Data Discovery Hub. It runs inside Snowflake and is deployed from a dbt Python model using the custom streamlit
materialization you saw above. The app ships as a few files (main app, a utilities module, and optional pages), and dbt’s YAML config tells the materialization which files to upload and how to create/update the app.
The app consists of a landing page with basic insights and navigation to other pages and dashboards, a few subpages containing source freshness, Snowflake usage, and a data marts overview based on the dbt manifest with a direct link to the Snowflake data preview.
Caveats
Now here’s something you should know about me. If I can choose between the right solution and the hacky solution, I will go for the hacky solution, then realise halfway through it’s not the right solution, but I’ve still learned something. Case in point: I once made a TF‑IDF implementation in Google Sheets. And it’s the same here: you should probably use a git repo for your Streamlit dashboards in Snowflake, but occasionally you might need this approach, and you’ll have learned something about custom dbt materializations. That being said, I think using this approach can be fine for internal use, but don’t rely on it too much.
- The local development workflow isn’t great. It’s been hard to get and keep a working connection to Snowflake. Not sure if that was my specific setup or the Snowflake Python connector in general.
- It’s slow: startup times can be multiple minutes; five minutes isn’t unusual. That makes it a poor fit for “real‑time” dashboards.
- Versions are supposed to be pinned with an
environment.yml
, but that functionality can be flaky, causing errors around newer features likest.link_button()
. It may fall back to “latest,” which often isn’t actually the latest. - There is no native way to access dbt models directly in Streamlit; you’ll likely read dbt metadata from the
manifest.json
.
References
- dbt docs: Materializations — docs.getdbt.com/docs/build/materializations
- dbt‑duckdb: External materialization macro — github.com/duckdb/dbt-duckdb/…/external.sql
- dbt docs: Python models — docs.getdbt.com/docs/build/python-models
- Example code (macro + app files) — gist.github.com/dumkydewilde/…