Ingestion

Turn Thousands of Messy JSON Files into One Parquet: DuckDB for Fast Data Warehouse Ingestion · Sep 22, 2025

If you've inherited a bucket full of thousands of tiny JSON files—one per API call, one per event, one per log minute—you know the pain: slow scans, schema anxiety, and rising warehouse bills. This guide shows you how to consolidate them into clean Parquet with DuckDB: handling schema drift, maintaining lineage, optimizing performance, and integrating with dbt. Touch your raw files once, then model against something stable.

Beyond Measure

Ingestion

Turn Thousands of Messy JSON Files into One Parquet: DuckDB for Fast Data Warehouse Ingestion · Sep 22, 2025

Data Ingestion Pipelines Without Headaches: 8 simple steps · Dec 12, 2023