A cheap, serverless version of Snowplow deployed with Terraform that runs on dumky.net
A serverless Snowplow pipeline on Google Cloud Platform (GCP) for ~€0.02/day.
This repository is a Terraform template to run a fully serverless snowplow pipeline based on Google Cloud Run and BigQuery. This allows you to run Snowplow at a minimal cost, especially for smaller sites and blogs.
The basic idea is to run a serverless collector and all the other components on a schedule (e.g. three times a day). This allows you to scale down to zero while allowing still run a near-realtime pipeline.
The pipeline uses the following components
The collector is serverless so it will scale to zero if there is no traffic. However if you do have continuous traffic, this might not be the best option for you as serverless instances work best for intermittent loads. The rest of the pipeline runs on a schedule where it will process and load everything every three hours between 08:00-20:00 —who has the time for real-time data anyway, right?
I've been running this setup for blog with ~15.000 visitors/month for 2 cents a day. If you still have your GCP credits, you should be good for the next 30 years or so... In any case you always want to set up cost controls and keep an eye on your spending. Don't say I didn't warn you.
With the license changes of Jan. 8 2024 the collector and enricher need an additional configuration setting to agree to the new licenses. Since this type of set up (personal, serverless) is not high-availability you should be ok to use it. Of course the licensing changes do mean that you can not use any of the new versions of these components in a production/high-availability setting. Optionally you can use the OpenSnowCat fork or use a stand alone component like buz.dev.