Abhioncbr Docker Airflow Save

Repo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializing GCP while container booting. https://abhioncbr.github.io/docker-airflow/

Project README

docker-airflow

CircleCI License Code Climate

This is a repository for building Docker container of Apache Airflow (incubating).

Images

Image Pulls Tags
abhioncbr/docker-airflow Docker Pulls tags

Airflow components stack

  • Airflow version: Notation for representing version XX.YY.ZZ
  • Execution Mode: standalone(simple container for exploration purpose, based on sqlite as airflow metadata db & SequentialExecutor ) or prod(single node based, LocalExecutor amd mysql as airflow metadata db) and cluster (for distributed production long run use-cases, container runs as either server or worker )
  • Backend database: standalone- Sqlite, prod & cluster- Mysql
  • Scheduler: standalone- Sequential, prod- LocalExecutor and Cluster- Celery
  • Task queue: cluster- Redis
  • Log location: local file system (Default) or AWS S3 (through entrypoint-s3.sh)
  • User authentication: Password based & support for multiple users with superuser privilege.
  • Code enhancement: password based multiple users supporting super-user(can see all dags of all owner) feature. Currently, Airflow is working on the password based multi user feature.
  • Other features: support for google cloud platform packages in container.

Airflow ports

  • airflow portal port: 2222
  • airflow celery flower: 5555
  • redis port: 6379
  • log files exchange port: 8793

Airflow services information

  • In server container: redis, airflow webserver & scheduler is running.
  • In worker container: airflow worker & celery flower ui service is running.

How to build images

  • DockerFile uses airflow-version as a build-arg.
  • build image, if you want to do some customization -
       docker build -t abhioncbr/docker-airflow:$IMAGE_VERSION --build-arg AIRFLOW_VERSION=$AIRFLOW_VERSION
                  --build-arg AIRFLOW_PATCH_VERSION=$AIRFLOW_PATCH_VERSION -f ~/docker-airflow/docker-files/DockerFile .
    
    • Arg IMAGE_VERSION value should be airflow version for example, 1.10.3 or 1.10.2
    • Arg AIRFLOW_PATCH_VERSION value should be the major release version of airflow for example for 1.10.2 it should be 1.10.

How to run using Kitmatic

  • Simplest way for exploration purpose, using Kitematic(Run containers through a simple, yet powerful graphical user interface.)
    • Search abhioncbr/docker-airflow Image on docker-hub search-docker-airflow-Kitematic

    • Start a container through Kitematic UI. run-docker-airflow-Kitematic

How to run

  • General commands -

    • starting airflow image as a airflow-standalone container in a standalone mode-

      docker run --net=host -p 2222:2222 --name=airflow-standalone abhioncbr/airflow-XX.YY.ZZ -m=standalone &
      
    • Starting airflow image as a airflow-server container in a cluster mode-

      docker run --net=host -p 2222:2222 -p 6379:6379 --name=airflow-server \
      abhioncbr/airflow-XX.YY.ZZ -m=cluster -t=server -d=mysql://user:password@host:3306/db-name &
      
    • Starting airflow image as a airflow-worker container in a cluster mode-

      docker run --net=host -p 5555:5555 -p 8739:8739 --name=airflow-worker \
      abhioncbr/airflow-XX.YY.ZZ -m=cluster -t=worker -d=mysql://user:password@host:3306/db-name -r=redis://<airflow-server-host>:6379/0 &
      
  • In Mac using docker for mac -

    • Standalone Mode - starting airflow image in a standalone mode & mounting dags, code-artifacts & logs folder to host machine -

      docker run -p 2222:2222 --name=airflow-standalone \
      -v ~/airflow-data/code-artifacts:/code-artifacts \
      -v ~/airflow-data/logs:/usr/local/airflow/logs \
      -v ~/airflow-data/dags:/usr/local/airflow/dags \
      abhioncbr/airflow-XX.YY.ZZ -m=standalone &
      
    • Cluster Mode

      • starting airflow image as a server container & mounting dags, code-artifacts & logs folder to host machine -

        docker run -p 2222:2222 -p 6379:6379 --name=airflow-server \
        -v ~/airflow-data/code-artifacts:/code-artifacts \
        -v ~/airflow-data/logs:/usr/local/airflow/logs \
        -v ~/airflow-data/dags:/usr/local/airflow/dags \
        abhioncbr/airflow-XX.YY.ZZ \
        -m=cluster -t=server -d=mysql://user:[email protected]:3306:3306/<airflow-db-name> &
        
      • starting airflow image as a worker container & mounting dags, code-artifacts & logs folder to host machine -

        docker run -p 5555:5555 -p 8739:8739 --name=airflow-worker \
        -v ~/airflow-data/code-artifacts:/code-artifacts \
        -v ~/airflow-data/logs:/usr/local/airflow/logs \
        -v ~/airflow-data/dags:/usr/local/airflow/dags \
        abhioncbr/airflow-XX.YY.ZZ \
        -m=cluster -t=worker -d=mysql://user:[email protected]:3306:3306/<airflow-db-name> -r=redis://host.docker.internal:6379/0 &   
        

    Airflow

Distributed execution of airflow

  • As mentioned above, docker image of airflow can be leveraged to run in complete distributed run
    • single docker-airflow container in server mode for serving the UI of the airflow, redis for celery task & scheduler.
    • multiple docker-airflow containers in worker mode for executing tasks using celery executor.
    • centralised airflow metadata database.
  • Image below depicts the docker-airflow distributed platform: Distributed-Airflow
Open Source Agenda is not affiliated with "Abhioncbr Docker Airflow" Project. README Source: abhioncbr/docker-airflow

Open Source Agenda Badge

Open Source Agenda Rating