Posts
- DuckDB is my new Excel How to replace Excel for ad-hoc data exploration using DuckDB: query remote CSVs, wildcard multi-file ingestion, union-by-name schema handling, unnesting lists, aggregating and counting, and flattening JSON APIs. If you can work with Excel formulas, you can do this. Continue reading...
- Turn Thousands of Messy JSON Files into One Parquet: DuckDB for Fast Data Warehouse Ingestion Turn sprawling API JSON dumps into a single lean Parquet artifact with DuckDB, ready for Snowflake, BigQuery, Redshift, Postgres or DuckDB itself. This guide walks through schema drift handling, lineage, performance considerations, and dbt integration patterns. Continue reading...
- LLM Personas with Faitch: An AI-Powered Fetch for Content Analysis and making your life bearable Build a CLI tool that combines web scraping with LLM personas for automated content analysis. Turn any webpage into insightful commentary with AI characters like Nietzsche or your own data engineering mentor. Continue reading...
- Beyond Tables and Views: Building a Custom dbt Materialization to Deploy Streamlit Apps on Snowflake What if your dbt model could deploy a dashboard instead of just another table? This tutorial shows how to build a custom dbt materialization that deploys Streamlit apps directly on Snowflake. Continue reading...
- Optimizing Snowflake Costs with dbt Query Tags Set Snowflake query tags with dbt to monitor which models are burning through your Snowflake credits. Track usage over time, identify expensive models, and optimize your biggest cost drivers with three ingredients: custom macro, usage tables as dbt sources, comprehensive cost calculations. Continue reading...
- Test Driven Development (TDD) with dbt: Test First, SQL Later Stop building dbt models and praying they're correct. Start defining what "good" looks like first. This guide shows you how to apply TDD to analytics engineering, from unit tests to model contracts, so your data is trustworthy instead of just hope-it-works. Continue reading...
- Unit Testing dbt Macros: A workaround for dbt's unit testing limitations Ever wished you could catch that broken SQL logic before it wrecks your dashboards? With dbt 1.8's new unit testing capabilities, you can finally sleep at night! However, support for testing macros is still limited. Let's explore how to test both models and macros with a workaround. Continue reading...
- Data Observability is not a tool: understanding data quality at the source, in transformations and in governance Have you ever wasted time or money because you made a decision based on incorrect data? Then you'll appreciate good data quality. Buying an observability, however, might not be the solution to your data quality issues. Let's explore how data quality issues arise at the source, in transformations and in data governance and find the appropriate solutions to those problems. Continue reading...
- Data Ingestion Pipelines Without Headaches: 8 simple steps Data, like wine and cheese, becomes more valuable when combined. However, to combine, you must first retrieve the data and a reliable and scalable manner. This post covers the 8 steps of a data ingestion pipeline and 3 overarching topics to ensure reliability and quality over time. Continue reading...
- Adding Geo and ISP data to your analytics hits with Snowplow and Cloudflare Workers In this post we'll look at how to add geo and ISP data to your analytics hits with Snowplow and Cloudflare Workers, an approach that you can also re-use for GA4. Continue reading...
