Posts
- Turn Thousands of Messy JSON Files into One Parquet: DuckDB for Fast Data Warehouse Ingestion Turn sprawling API JSON dumps into a single lean Parquet artifact with DuckDB, ready for Snowflake, BigQuery, Redshift, Postgres or DuckDB itself. This guide walks through schema drift handling, lineage, performance considerations, and dbt integration patterns. Continue reading...
- LLM Personas with Faitch: An AI-Powered Fetch for Content Analysis and making your life bearable Build a CLI tool that combines web scraping with LLM personas for automated content analysis. Turn any webpage into insightful commentary with AI characters like Nietzsche or your own data engineering mentor. Continue reading...
- Beyond Tables and Views: Building a Custom dbt Materialization to Deploy Streamlit Apps on Snowflake What if your dbt model could deploy a dashboard instead of just another table? This tutorial shows how to build a custom dbt materialization that deploys Streamlit apps directly on Snowflake. Continue reading...
- Optimizing Snowflake Costs with dbt Query Tags Set Snowflake query tags with dbt to monitor which models are burning through your Snowflake credits. Track usage over time, identify expensive models, and optimize your biggest cost drivers with three ingredients: custom macro, usage tables as dbt sources, comprehensive cost calculations. Continue reading...
- Test Driven Development (TDD) with dbt: Test First, SQL Later Stop building dbt models and praying they're correct. Start defining what "good" looks like first. This guide shows you how to apply TDD to analytics engineering, from unit tests to model contracts, so your data is trustworthy instead of just hope-it-works. Continue reading...
- Unit Testing dbt Macros: A workaround for dbt's unit testing limitations Ever wished you could catch that broken SQL logic before it wrecks your dashboards? With dbt 1.8's new unit testing capabilities, you can finally sleep at night! However, support for testing macros is still limited. Let's explore how to test both models and macros with a workaround. Continue reading...
- Data Observability is not a tool: understanding data quality at the source, in transformations and in governance Have you ever wasted time or money because you made a decision based on incorrect data? Then you'll appreciate good data quality. Buying an observability, however, might not be the solution to your data quality issues. Let's explore how data quality issues arise at the source, in transformations and in data governance and find the appropriate solutions to those problems. Continue reading...
- Data Ingestion Pipelines Without Headaches: 8 simple steps Data, like wine and cheese, becomes more valuable when combined. However, to combine, you must first retrieve the data and a reliable and scalable manner. This post covers the 8 steps of a data ingestion pipeline and 3 overarching topics to ensure reliability and quality over time. Continue reading...
- Adding Geo and ISP data to your analytics hits with Snowplow and Cloudflare Workers In this post we'll look at how to add geo and ISP data to your analytics hits with Snowplow and Cloudflare Workers, an approach that you can also re-use for GA4. Continue reading...
- Own your web analytics pipeline for €0.02 per day: Snowplow, Terraform, dbt, BigQuery and Docker Running Snowplow for your (web) analytics pipeline to expensive? Here's a €0.02/day minimal, serverless version of Snowplow open source that you can deploy for your blog or website with Terraform (on GCP/BigQuery) in 5 minutes giving you full ownership of a web and app analytics pipeline from data collection to custom data models (👋 goodbye Google Analytics). Continue reading...