NOTE: This project is currently unmaintained, if anyone would like to take over maintenance please let us know.
Jupyter Notebook extension for Apache Spark integration.
Includes a progress indicator for the current Notebook cell if it invokes a Spark job. Queries the Spark UI service on the backend to get the required Spark job information.
To view all currently running jobs, click the "show running Spark jobs"
button, or press
A proxied version of the Spark UI can be accessed at http://localhost:8888/spark.
To install, simply run:
pip install jupyter-spark jupyter serverextension enable --py jupyter_spark jupyter nbextension install --py jupyter_spark jupyter nbextension enable --py jupyter_spark jupyter nbextension enable --py widgetsnbextension
The last step is needed to enable the
widgetsnbextension extension that
Jupyter-Spark depends on. It may have been enabled before by a different
You may want to append
--user to the commands above if you're getting
configuration errors upon invoking them.
To double-check if the extension was correctly installed run:
jupyter nbextension list jupyter serverextension list
Pleaes feel free to install lxml as well to improve performance of the server side communication to Spark using your favorite package manager, e.g.:
pip install lxml
For development and testing, clone the project and run from a shell in the project's root directory:
pip install -e . jupyter serverextension enable --py jupyter_spark jupyter nbextension install --py jupyter_spark jupyter nbextension enable --py jupyter_spark
To uninstall the extension run:
jupyter serverextension disable --py jupyter_spark jupyter nbextension disable --py jupyter_spark jupyter nbextension uninstall --py jupyter_spark pip uninstall jupyter-spark
To change the URL of the Spark API that the job metadata is fetched from
Spark.url config value, e.g. on the command line:
jupyter notebook --Spark.url="http://localhost:4040"
There is a simple
pyspark example included in
examples to confirm that your
installation is working.
Rewrote proxy to use an async Tornado handler and HTTP client to fetch responses from Spark.
Simplified proxy processing to take Amazon EMR proxying into account
Extended test suite to cover proxy handler, too.
Removed requests as a dependency.
Refactored to fix a bunch of Python packaging and code quality issues
Added test suite for Python code
Set up continuous integration: https://travis-ci.org/mozilla/jupyter-spark
Set up code coverage reports: https://codecov.io/gh/mozilla/jupyter-spark
Added ability to override Spark API URL via command line option
IMPORTANT Requires manual step to enable after running pip install (see installation docs)!
pip uninstall jupyter-spark
jupyter_notebook_config.json(in your .jupyter directory)