These materials provide a brief hands-on introduction to the parallel computing system, Dask. They are intended to be delivered over a 90 minute session and cover the following topics.
These topics are far from comprehensive, but have been chosen to give a flavor for what can be done with Dask.
These materials are presented as Jupyter notebooks, which should be available within this directory.
To get started download this repository:
git clone https://github.com/mrocklin/dask-workshop
Create a conda environment with the following commands:
conda create -n dask-workshop -c conda-forge python=3 dask distributed jupyter bokeh feather-format python-graphviz matplotlib tornado=4.4
source activate dask-workshop
pip install pandas_datareader
Then start a Jupyter notebook server and begin with the first notebook:
jupyter notebook
Note: feather-format is not available in Python 2 on Windows.
Note: tornado 4.5 and bokeh 0.12.5 have known compatibility issues.
This tutorial covered dask.dataframe and dask.delayed for simple tabular computations. This is a common and important case, but is only one of many applications for which Dask is used. If you are interested in arrays, machine learning, asynchronous computations, etc. you may wish to peruse the documentation further:
If you want to try Dask on a cluster on Amazon or Google hardware then you might try one of the following projects: