Beyond Tables and Views: Building a Custom dbt Materialization to Deploy Streamlit Apps on Snowflake

Deploying Streamlit apps with dbt on Snowflake using a custom dbt materialization: a practical tutorial and exploration

If you want to jump straight into the code, you can find it on Github.

What if your dbt model could deploy a dashboard instead of creating a table? If you’ve worked with dbt for any amount of time, you’re familiar with the usual suspects: table, view, and incremental materializations. These are the bread and butter of data transformation—turning your SQL into database objects that store or reference your transformed data. Tables persist your results, views create virtual representations, and incremental models efficiently update existing datasets.

But here’s where it gets interesting: I was working on a source freshness dashboard in Snowflake using Streamlit, constantly switching between my dbt transformations and my visualization code. Then I thought: Why am I managing these separately? Snowflake defines everything in SQL, my dbt models already know everything about my data structure, my business logic and source freshness. What if the same workflow that deploys my transformed data could also deploy the applications that visualize it? And that’s not the only use case I saw. These are just a few cases where I think dbt can benefit from interactive visualization with Streamlit:

Showing the costs of your models
Offering better documentation of your data marts
Having a one‑stop shop for documentation, dashboards, and data requests
Keeping an eye on how your dbt pipelines are running (e.g., using the Elementary dbt package)

snowflake streamlit source freshness dashboard example

dbt custom materializations

If you’re new to custom materializations, here’s the 60‑second version: a dbt materialization is just a recipe for what happens when a model runs. Under the hood, most follow a predictable six‑step pattern:

prepare (resolve references and configs)
run pre‑hooks
execute the core work
run post‑hooks
clean up
update the relation cache

The key insight is that the “core work” can be anything your adapter can talk to: not only creating tables or views, but also writing files, managing external objects, or even orchestrating application artifacts. That’s how projects like dbt‑duckdb external materializations implement files, and it’s the same idea we’ll bend toward deploying a Streamlit app.

Note: This post targets Snowflake Streamlit apps specifically. For background, see Snowflake’s docs on About Streamlit in Snowflake.

Streamlit materialization architecture

At a high level, the materialization treats your model as an app deployment:

use a dbt YAML configuration for settings
upload app files to a Snowflake stage
create/update a Streamlit app that points at those files

We’ll go deeper on each piece, but here’s a quick taste.

# models/data_discovery_hub.py — a dbt Python model
import pandas as pd

def model(dbt, session):
    # The materialization does the real work (upload + app create)
    # We return an empty DataFrame to satisfy dbt's Python model contract
    return pd.DataFrame()

This conforms to the dbt Python model contract (return a pandas DataFrame). The custom streamlit materialization handles the file uploads and Streamlit app creation.

# models/_models.yml — minimal config
models:
  - name: source_freshness_dashboard
    config:
      materialized: streamlit
      # keys are read directly via config.get(<key>) in the materialization
      app_name: data_discovery_hub
      stage: app_stage
      pages:
        - app_data/pages/other_documentation_page.py
      include_files:
        - app_data/utils.py
      # optional settings
      include_manifest: true  # will append target/manifest.json automatically
      # warehouse: my_wh

Stages and core components

If you’re not familiar with stages on Snowflake, they’re internal file stores you can access with SQL. In this case they act as the app’s file store inside Snowflake. The materialization (a Jinja macro) creates a stage if needed, uploads your files with PUT, and then creates a Streamlit app that points at that stage path:

-- create or alter the stage (id built from database.schema.stage)
CREATE OR ALTER STAGE {{ database }}.{{ schema }}.{{ config.get('stage', 'streamlit_apps') }}
  DIRECTORY = (ENABLE = TRUE)
  COMMENT = 'Stage for Streamlit application';

-- upload the main app file
PUT 'file://{{ model.original_file_path }}' @{{ database }}.{{ schema }}.{{ config.get('stage', 'streamlit_apps') }}/{{ config.get('app_name', model['alias']) }}
  OVERWRITE = TRUE AUTO_COMPRESS = FALSE;

-- create or replace the Streamlit app from stage
CREATE OR REPLACE STREAMLIT {{ database }}.{{ schema }}.{{ model['alias'] }}
FROM @{{ database }}.{{ schema }}.{{ config.get('stage', 'streamlit_apps') }}/{{ config.get('app_name', model['alias']) }}
  MAIN_FILE = '{{ model.original_file_path.split('/') | last }}'
  QUERY_WAREHOUSE = '{{ config.get('warehouse', target.warehouse) }}';

Now instead of uploading just the one model file, you often want additional files for convenience too: utilities, images, pages, etc. You can specify these in the configuration, but that also means we have to take care of it in the materialization.

{# iterate over the extra files #}
{% for file in config.get('include_files', []) %}
  {% set dest = ([stage_name, app_name] + file.split('/')[:-1]) | join('/')  %}
  {% if 'environment.yml' in file %}
    {% set dest = [stage_name, app_name, 'environment.yml'] | join('/') %}
  {% endif %}

  {{ log('Uploading file: ' ~ file ~ ' to: ' ~ database ~ '.' ~ schema ~'.' ~ dest, info=True) }}
  PUT  'file://{{ file }}' @{{ dest }}
      OVERWRITE = TRUE
      AUTO_COMPRESS = FALSE;
{% endfor %}

{# put the pages in a specific 'pages' folder for streamlit #}
{% for file in config.get('pages', []) %}
  {% set dest = [stage_name, app_name, 'pages'] | join('/')  %}
  {{ log('Uploading page: ' ~ file ~ ' to: ' ~ database ~ '.' ~ schema ~'.' ~ dest, info=True) }}
  PUT  'file://{{ file }}' @{{ dest }}
      OVERWRITE = TRUE
      AUTO_COMPRESS = FALSE;
{% endfor %}

Two small but useful implementation notes:

On full-refresh, the macro drops the Streamlit app and the stage before recreating them. This ensures a clean deploy when you need it.
By default, the macro appends your dbt target/manifest.json to the upload list (include_manifest: true). That makes it easy for your app to read dbt metadata at runtime.

Seeing it in action: a Data Discovery Hub

To make this concrete, I packaged a small Streamlit app (you can find the example code on Github) which I call the Data Discovery Hub. It runs inside Snowflake and is deployed from a dbt Python model using the custom streamlit materialization you saw above. The app ships as a few files (main app, a utilities module, and optional pages), and dbt’s YAML config tells the materialization which files to upload and how to create/update the app.

The app consists of a landing page with basic insights and navigation to other pages and dashboards, a few subpages containing source freshness, Snowflake usage, and a data marts overview based on the dbt manifest with a direct link to the Snowflake data preview.

Snowflake Streamlit Data Discovery Hub interface showing navigation menu with options for Source Freshness, Snowflake Usage, and Data Marts, along with key metrics and recent activity overview

Snowflake Streamlit Data Discovery Hub showing the Data Marts page with a table of dbt models including their names, types, database schemas, and preview links for exploring data directly in Snowflake

Caveats

Now here’s something you should know about me. If I can choose between the right solution and the hacky solution, I will go for the hacky solution, then realise halfway through it’s not the right solution, but I’ve still learned something. Case in point: I once made a TF‑IDF implementation in Google Sheets. And it’s the same here: you should probably use a git repo for your Streamlit dashboards in Snowflake, but occasionally you might need this approach, and you’ll have learned something about custom dbt materializations. That being said, I think using this approach can be fine for internal use, but don’t rely on it too much.

The local development workflow isn’t great. It’s been hard to get and keep a working connection to Snowflake. Not sure if that was my specific setup or the Snowflake Python connector in general.
It’s slow: startup times can be multiple minutes; five minutes isn’t unusual. That makes it a poor fit for “real‑time” dashboards.
Versions are supposed to be pinned with an environment.yml, but that functionality can be flaky, causing errors around newer features like st.link_button(). It may fall back to “latest,” which often isn’t actually the latest.
There is no native way to access dbt models directly in Streamlit; you’ll likely read dbt metadata from the manifest.json.

References

dbt docs: Materializations — docs.getdbt.com/docs/build/materializations
dbt‑duckdb: External materialization macro — github.com/duckdb/dbt-duckdb/…/external.sql
dbt docs: Python models — docs.getdbt.com/docs/build/python-models
Example code (macro + app files) — gist.github.com/dumkydewilde/…