Dask Workshop Save

Project README

Dask Workshop

These materials provide a brief hands-on introduction to the parallel computing system, Dask. They are intended to be delivered over a 90 minute session and cover the following topics.

  1. Parallelize existing code with dask.delayed
  2. Set up the dask.distributed system on your local laptop
  3. Use Dask.dataframe on time series data

These topics are far from comprehensive, but have been chosen to give a flavor for what can be done with Dask.

These materials are presented as Jupyter notebooks, which should be available within this directory.

To get started download this repository:

git clone https://github.com/mrocklin/dask-workshop

Create a conda environment with the following commands:

conda create -n dask-workshop -c conda-forge python=3 dask distributed jupyter bokeh feather-format python-graphviz matplotlib tornado=4.4
source activate dask-workshop
pip install pandas_datareader

Then start a Jupyter notebook server and begin with the first notebook:

jupyter notebook

Note: feather-format is not available in Python 2 on Windows.

Note: tornado 4.5 and bokeh 0.12.5 have known compatibility issues.

After Finishing

This tutorial covered dask.dataframe and dask.delayed for simple tabular computations. This is a common and important case, but is only one of many applications for which Dask is used. If you are interested in arrays, machine learning, asynchronous computations, etc. you may wish to peruse the documentation further:

If you want to try Dask on a cluster on Amazon or Google hardware then you might try one of the following projects:

Open Source Agenda is not affiliated with "Dask Workshop" Project. README Source: mrocklin/dask-workshop
Stars
28
Open Issues
0
Last Commit
6 years ago

Open Source Agenda Badge

Open Source Agenda Rating