Allegro Bigflow Save

A Python framework for data processing on GCP.

Project README

BigFlow

Documentation

  1. What is BigFlow?
  2. Getting started
  3. Installing Bigflow
  4. Help me
  5. BigFlow tutorial
  6. CLI
  7. Configuration
  8. Project structure and build
  9. Deployment
  10. Workflow & Job
  11. Starter
  12. Technologies
  13. Development

Cookbook

What is BigFlow?

BigFlow is a Python framework for data processing pipelines on GCP.

The main features are:

Getting started

Start from installing BigFlow on your local machine. Next, go through the BigFlow tutorial.

Installing BigFlow

Prerequisites. Before you start, make sure you have the following software installed:

  1. Python = 3.8
  2. Google Cloud SDK
  3. Docker Engine

You can install the bigflow package globally, but we recommend installing it locally with venv, in your project's folder:

python -m venv .bigflow_env
source .bigflow_env/bin/activate

Install the bigflow PIP package:

pip install bigflow[bigquery,dataflow]

Test it:

bigflow -h

Read more about BigFlow CLI.

To interact with GCP you need to set a default project and log in:

gcloud config set project <your-gcp-project-id>
gcloud auth application-default login

Finally, check if your Docker is running:

docker info

Help me

You can ask questions on our gitter channel or stackoverflow.

Open Source Agenda is not affiliated with "Allegro Bigflow" Project. README Source: allegro/bigflow
Stars
115
Open Issues
48
Last Commit
6 days ago
Repository

Open Source Agenda Badge

Open Source Agenda Rating