Beyond Measure

👋 Hey! I’m Dumky, Developer & Agent Experience Engineer at MotherDuck, the serverless analytics platform built on DuckDB. With a background as a Senior Analytics Engineer, I’m passionate about making data engineering accessible and practical.

I’m also a co-author of Fundamentals of Analytics Engineering and have built everything from €0.02/day Snowplow pipelines to terraformed dbt deployments. When I’m not talking about data, you’ll find me organizing meetups and speaking at conferences about the modern data stack.

Recent Posts

Microbatch: how to supercharge dbt-duckdb with the right incremental model · Feb 2, 2026

Learn when and how to use dbt's microbatch incremental strategy with DuckDB. Covers row groups vs partitions, benchmarks comparing full refresh, merge, delete+insert, and microbatch strategies, plus configuration tips and common pitfalls.

dbt batch data-engineering duckdb
LLMs Are Cheap Enough to Change How You Work With Data · Dec 21, 2025

I processed 60,000 job postings with an LLM for $10. That's not a typo. This post walks through how to use OpenAI's Batch API for bulk document processing: building JSONL request files, orchestrating without losing your mind, controlling costs with token limits, and tracking batch state in a database. When AI becomes this cheap, it changes what's possible.

ai openai llm batch cost_optimization orchestration data-engineering
DuckDB is my new Excel · Nov 3, 2025

I'll admit it: as a data engineer I still use Excel. But DuckDB has become my secret weapon for ad-hoc data exploration. No more struggling with CSV imports, no more manual column matching across files, no more giving up on JSON. This guide shows you how to query remote CSVs, handle schema drift across multiple files, unnest nested data, and flatten JSON APIs—all with SQL you can remember if you can remember Excel formulas.

duckdb motherduck data-engineering json csv analytics data_exploration excel
Turn Thousands of Messy JSON Files into One Parquet: DuckDB for Fast Data Warehouse Ingestion · Sep 22, 2025

If you've inherited a bucket full of thousands of tiny JSON files—one per API call, one per event, one per log minute—you know the pain: slow scans, schema anxiety, and rising warehouse bills. This guide shows you how to consolidate them into clean Parquet with DuckDB: handling schema drift, maintaining lineage, optimizing performance, and integrating with dbt. Touch your raw files once, then model against something stable.

duckdb data-engineering json parquet dbt ingestion performance analytics
LLM Personas with Faitch: An AI-Powered Fetch for Content Analysis and making your life bearable · Sep 14, 2025

I got tired of copy-pasting articles into ChatGPT for summaries—and even more tired of the bland results. So I built a CLI tool that combines web scraping with LLM personas. Now I can ask Nietzsche what he thinks of Hacker News, or get my skeptical analytics engineer persona to tear apart a vendor blog post. This post shows you how to build faitch: web scraping meets AI characters.

AI LLM CLI Productivity scraping
Beyond Tables and Views: Building a Custom dbt Materialization to Deploy Streamlit Apps on Snowflake · Aug 27, 2025

What if dbt could deploy a dashboard instead of just another table? I was tired of switching between my dbt transformations and Streamlit visualization code, so I built a custom materialization that deploys Streamlit apps directly on Snowflake. This tutorial walks through the architecture: using dbt configs, uploading files to stages, and creating apps that know about your data models, lineage, and freshness.

dbt snowflake streamlit data-engineering custom-materializations data-observability python
Optimizing Snowflake Costs with dbt Query Tags · Jul 19, 2025

Snowflake costs can spiral out of control fast—especially when dbt runs alongside other queries and you can't tell which models are burning through credits. This guide shows you how to set up query tags that track every dbt model, then build dashboards to see usage over time, identify expensive models, and find your biggest cost drivers. Three ingredients: a custom macro, usage tables as sources, and some math.

analytics-engineering cost dbt snowflake
Test Driven Development (TDD) with dbt: Test First, SQL Later · Jun 27, 2025

You know that feeling when your dbt model runs but the numbers look... off? Not broken, just weird enough to make you question your SQL. Most data quality surprises start with 'let me just quickly fix this.' What if you wrote the tests before the model instead? This guide applies Test-Driven Development to analytics engineering: unit tests, model contracts, and defining what 'good' looks like before you write a single SELECT.

analytics-engineering testing dbt tdd
Unit Testing dbt Macros: A workaround for dbt's unit testing limitations · Mar 10, 2025

Remember when dbt projects meant folders full of SQL files and crossing your fingers that your transformations were correct? dbt 1.8 finally brought unit testing, but macro testing is still limited. This post walks through unit testing your business logic—like standardizing campaign sources—and a workaround for testing macros so you can catch broken SQL before it wrecks your dashboards.

dbt data-engineering data-quality sql testing
Data Observability is not a tool: understanding data quality at the source, in transformations and in governance · Apr 8, 2024

Data observability tools promise to fix your data quality, but buying a tool won't solve problems you don't understand. This post breaks down where data quality actually breaks: in source systems (completeness, consistency, reliability), during transformations (integrity, accuracy), and in governance (documentation, ownership). Understand the root causes before you reach for a solution.

observability data-quality data-governance lineage ETL analytics

View all posts →

Recent Posts

Microbatch: how to supercharge dbt-duckdb with the right incremental model · Feb 2, 2026

LLMs Are Cheap Enough to Change How You Work With Data · Dec 21, 2025

DuckDB is my new Excel · Nov 3, 2025

Turn Thousands of Messy JSON Files into One Parquet: DuckDB for Fast Data Warehouse Ingestion · Sep 22, 2025

LLM Personas with Faitch: An AI-Powered Fetch for Content Analysis and making your life bearable · Sep 14, 2025

Beyond Tables and Views: Building a Custom dbt Materialization to Deploy Streamlit Apps on Snowflake · Aug 27, 2025

Optimizing Snowflake Costs with dbt Query Tags · Jul 19, 2025

Test Driven Development (TDD) with dbt: Test First, SQL Later · Jun 27, 2025

Unit Testing dbt Macros: A workaround for dbt's unit testing limitations · Mar 10, 2025

Data Observability is not a tool: understanding data quality at the source, in transformations and in governance · Apr 8, 2024