Posts
- Fetching IPv4 CIDR ranges from AWS, GCP, Azure and Cloudflare for bot detection with Python Bots usually run on one of the major cloud providers. Identifying them can be a big factor in determining the quality of your traffic. Whether that's for web analytics or threat mitigation, it's useful to have an overview of IP ranges to identify in bot scoring. Continue reading...
- Automatically Lint and Publish your Snowplow Schemas with Github Actions Snowplow schemas are a great way to codify expected data in JSON format. Using Github actions you can make them eevn more powerful by automatically checking for typos, validity, and other errors as well as directly publishing them to your production environment with no manual action. Continue reading...
- Using Search Console BigQuery data with NLP to extract and group by the most important topics With the recent update to Google Search Console (GSC) allowing exports to BigQuery we can now leverage some power features of BigQuery to do text processing and extract topics from our search queries with a simple JavaScript UDF. Continue reading...
- Why web analytics is still a mess in 2023 Web analytics still feels 'messy' in 2023. Why is it so hard to solve the problem of web analytics? Let's dive into some of the misconceptions that fuel the mess, like the ideas that websites are easy, are visited by people, that web analytics is about tracking poeple, that we have all the tools we need, and that web analytics is actually important. Continue reading...
- Mastering Time in dbt: Incremental Merging of Estimates and Actuals for large datasets Managing incrementality (change over time) in a large database is hard. Dbt can help us alleviate some of the pain by making the selection of incremental strategies we have easier to choose from. Lets look at updating an example sales table with actuals and estimates over time. Continue reading...
- Create your own API from BigQuery data in minutes with SQL exports and Cloudflare Workers Want to have data from BigQuery publicly available? Create a simple API with BigQuery scheduled queries, JSON exports and a Cloudflare Worker to map the right URL to the right data. Continue reading...
- Dbt In a Box: Using Google Cloud Run and BigQuery to run your dbt SQL models from a Docker container Dbt is a great tool for data transformation. Snowplow is great for collecting web analytics data. What if you could harvest the power of both for just a few cents a day by running dbt in a Docker container on Google Cloud Run Jobs? Continue reading...
- Language Detection in SQL with BigQuery Remote Functions Over the last few years SQL has really started embracing its second adolescence. That's cool, but what if you could easily extend your queries beyond the SQL domain and add in Python and Javascript based serverless functions to get real time stock information, enrich location data or: build a language detection function!? That's what we'll do. Continue reading...
- Check Cookie Consent with Playwright's browser automation in Python There's nothing like watching 20 browser windows pop-up on your screen to make you feel like a proper hacker. Let's write a Python script to do GDPR consent checks with Playwright and detect the consent manager, cookies set, and marketing and analytics trackers on a site. Continue reading...
- Analytics on the edge: server-side request tracking and cookie setting using Cloudflare Workers Server-side tracking is all the rage these days, but let me tell you about the uber-coolest kid on the blockchain: edge analytics. I'm kidding, there's no such thing as edge analytics (except maybe for IoT devices), but there is the possibility to intercept requests on the 'edge' of the network. Using Cloudflare Workers, you can send data to Google Analytics for all kinds of scenarios, even for users visiting pages THAT DON'T EVEN EXIST! Continue reading...