Repository used for Spark Trainings
This repository contains many different examples, exercises and tutorials for Spark and Hadoop trainings performed by dimajix. You can always find the latest version on GitHub at
https://github.com/dimajix/spark-training
The repository contains different types of documents
Some notebooks require some test data provided by dimajix on S3 at s3://dimajix-training/data/.
The source code can be built using Maven, simply by running
mvn install
from the root directory.
Most code is either provided as interactive Notebooks (Jupyter and/or Zeppelin) or as compilable programs. Programs which create jar files always contain start scripts, which take care of setting any environment variables and Spark configuration properties.