You're right to be anxious about AI: This is how much we are building ·
Measuring the rate of software output growth in the AI era, across Show HN, GitHub, package registries, and research.
đź‘‹ Hey! I’m Dumky, Developer & Agent Experience Engineer at MotherDuck, the serverless analytics platform built on DuckDB. With a background as a Senior Analytics Engineer, I’m passionate about making data engineering accessible and practical.
I’m also a co-author of Fundamentals of Analytics Engineering and have built everything from €0.02/day Snowplow pipelines to terraformed dbt deployments. When I’m not talking about data, you’ll find me organizing meetups and speaking at conferences about the modern data stack.
Measuring the rate of software output growth in the AI era, across Show HN, GitHub, package registries, and research.
Learn when and how to use dbt's microbatch incremental strategy with DuckDB. Covers row groups vs partitions, benchmarks comparing full refresh, merge, delete+insert, and microbatch strategies, plus configuration tips and common pitfalls.
I processed 60,000 job postings with an LLM for $10. That's not a typo. This post walks through how to use OpenAI's Batch API for bulk document processing: building JSONL request files, orchestrating without losing your mind, controlling costs with token limits, and tracking batch state in a database. When AI becomes this cheap, it changes what's possible.
I'll admit it: as a data engineer I still use Excel. But DuckDB has become my secret weapon for ad-hoc data exploration. No more struggling with CSV imports, no more manual column matching across files, no more giving up on JSON. This guide shows you how to query remote CSVs, handle schema drift across multiple files, unnest nested data, and flatten JSON APIs—all with SQL you can remember if you can remember Excel formulas.
If you've inherited a bucket full of thousands of tiny JSON files—one per API call, one per event, one per log minute—you know the pain: slow scans, schema anxiety, and rising warehouse bills. This guide shows you how to consolidate them into clean Parquet with DuckDB: handling schema drift, maintaining lineage, optimizing performance, and integrating with dbt. Touch your raw files once, then model against something stable.
I got tired of copy-pasting articles into ChatGPT for summaries—and even more tired of the bland results. So I built a CLI tool that combines web scraping with LLM personas. Now I can ask Nietzsche what he thinks of Hacker News, or get my skeptical analytics engineer persona to tear apart a vendor blog post. This post shows you how to build faitch: web scraping meets AI characters.
What if dbt could deploy a dashboard instead of just another table? I was tired of switching between my dbt transformations and Streamlit visualization code, so I built a custom materialization that deploys Streamlit apps directly on Snowflake. This tutorial walks through the architecture: using dbt configs, uploading files to stages, and creating apps that know about your data models, lineage, and freshness.
Snowflake costs can spiral out of control fast—especially when dbt runs alongside other queries and you can't tell which models are burning through credits. This guide shows you how to set up query tags that track every dbt model, then build dashboards to see usage over time, identify expensive models, and find your biggest cost drivers. Three ingredients: a custom macro, usage tables as sources, and some math.
You know that feeling when your dbt model runs but the numbers look... off? Not broken, just weird enough to make you question your SQL. Most data quality surprises start with 'let me just quickly fix this.' What if you wrote the tests before the model instead? This guide applies Test-Driven Development to analytics engineering: unit tests, model contracts, and defining what 'good' looks like before you write a single SELECT.
Remember when dbt projects meant folders full of SQL files and crossing your fingers that your transformations were correct? dbt 1.8 finally brought unit testing, but macro testing is still limited. This post walks through unit testing your business logic—like standardizing campaign sources—and a workaround for testing macros so you can catch broken SQL before it wrecks your dashboards.