Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase
In this project, we will be analyzing our listening history, top tracks & artists, and genres from Spotify. Here are the tools that we will be using:
dbt
containerdbt
to postgres
The diagram below illustrates the systems design and how the workflow will go.
Let's break this down into major steps
cd
to this directory
Open a terminal, create a Python virtual environment using:
Windows
> python -m venv venv
Mac/Linux
$ make build
then activate it by executing
Windows:
> venv\Scripts\activate.bat
(For Windows) Install dependencies using:
> python -m pip install -r requirements.txt
While dependencies are being installed, navigate to Spotify Developer Page and login
Create an app and note down the Client ID
and Client Secret
, make sure to add a redirect uri in Settings
i.e. http://localhost:8888/callback/
Fill the details in config_template.py and rename it to config.py
Run the main Python script to fetch the data from Spotify using:
Windows
> python app\main.py
Mac/Linux
$ make run
While the script is running, it will redirect to a webpage that looks like the one below, and just click AGREE
p.s. follow me for nice tunes! 😁
Now that we have the CSV files in the data
folder, we can now build our Docker containers using this command:
docker-compose up
This command will build our dbt
, postgres
, and metabase
containers. This will also run our data loading, transformations, and modeling in the background.
During docker-compose
, dbt runs the following commands
dbt init spotify_analytics
: Creates the project folderdbt debug
: Checks the connection with the Postgres databasedbt deps
: Installs the test dependenciesdbt seed
: Loads the CSV files into staging tables in the database in postgres
dbt run
: Runs the transformations and loads the data into the databasedbt docs generate
: Generates the documentation of the dbt projectdbt docs serve
: Serves the documentation on a webserverNavigating to http://localhost:8080 to see the documentation, we can see the lineage graph, a DAG (Directed Acyclic Graph).
This shows us how the CSV files have been transformed to the fact, dimension tables and views.
Now that the data is loaded and transformed in our database, we may now view it in http://localhost:3000. You may need to login, the credentials are
email: [email protected]
password: password1
Then you can navigate through, play around, and analyze your data.