A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
Data Warehouse Consists of various modules:
Data is obtained from here. The data collected is stored on local disk and is timely moved to the Landing Bucket on AWS S3. ETL jobs are written in SQL and scheduled in airflow to run every hour to keep data fresh in cloud data warehouse.
Following are the fact and dimension tables created:
aircrafts
airlines
passengers
airports
lounges
fact_ratings
Redshift: For Redshift I used 2 Node cluster with Instance Types dc2.large
Run the following commands in terminal to setup whole infrastructure locally:
git clone https://github.com/iam-mhaseeb/Skytrax-Data-Warehouse
cd Skytrax-Data-Warehouse
docker-compose up
. It will take sometime to pull latest images & install everything automatically in docker.You can follow the AWS Guide to run a Redshift cluster.
Make sure docker containers are running. Open the Airflow UI by hitting http://localhost:8080 in browser and setup required connections.
You should be able to see skytrax_etl_pipeline Dag like in pictures below:
Skytrax Pipeline DAG
You can explore dag further in different views like below:
DAG View:
DAG Tree View:
DAG Gantt View:
Make sure docker containers are running. Open the Metabase UI by hitting http://localhost:3000 in browser & setup your metabase account and database.
You should be able to play with data after running dag successfully like I made dashboard in pictures below:
Dashboard1:
Dashboard2:
Data increase by 100x. read > write. write > read
Pipelines would be run on 7am daily. how to update dashboard? would it still work?
Make it available to 100+ people
This project is licensed under the MIT License - see the LICENSE file for details