Spark Notebook Versions Save

Interactive and Reactive Data Science using Scala and Spark.

v0.8.3

6 years ago

like v0.8.2, just with a fix for Docker and Debian builds

v0.8.2

6 years ago

Note: for Spark < 2.0, see v0.7.0-pre2

This is (likely) the last release which supports Scala 2.10.

Various fixes and improvements, among others:

redesigned UI to be more user-friendly (minimalistic UX, cell context menu, improve sidebar)
better Scala 2.11 support (code autocompletion; fixed kernel failures; improved customDeps fetching)
Use coursier for faster deps resolving during the build
code cleanup and other stability fixes
ease usage of Mesos in Spark 2.1: it now includes the spark-mesos lib by default (added -Dwith.mesos=true build option)

New features:

SBT project generation from a notebook (experimental)
Notebook edit versioning, and storage on Git (experimental)
Viewer-only mode - a build option which makes notebooks not editable

Removed:

removed :dp, :cp, :local-repo, :remote-repo commands (use Edit -> Notebook metadata instead)
removed old plotting libs: Rickshawts , TauChart , LinePlot , Bokeh (all superseeded by Plotly)

v0.7.0-pre2

7 years ago

THIS IS FOR SPARK PRE 2

Based on v0.7.0, it uses its fixes, optimization and most new features unless spark 2 specific (SparkSession for instance).

v0.7.0

7 years ago

Note: for Spark < 2.0, see v0.7.0-pre2

Add spark 2 support
Many fixes for better stability (more lenient for user input, avoid kernels crash)
Lot of optimization for the viz, also replaced most Dimple with C3.
Introducing Plotly.js wrappers
Better debian support
Greater download as Markdown as zip with charts rendered as PNG referred in a images folder
Better doc available at all time in the doc folder
Cell dirtiness detection based on variables dependency graph
New default port to 9001 to avoid conflict with HDFS
Removed Wisp and Highcharts (in favor of plotly.js)
Code cleanup

v0.6.4

7 years ago

for spark <=1.6, use this release or the stale/spark-1.6-and-older branch

v0.6.3

8 years ago

Aside the stabilization with the all the bugs fixed, new features are:

improvement of the PivotChart
improvement of completion with type args and more
better sampling for automatic/default plots
added tests and travis
spark jobs are tracked by cells, cells have now ids
hardened the observables init
improved scala 2.11 support
improve Flow widget, added Custom box taking scala code directly as logic
job for a cell can be cancelled
read_only mode
notebooks are now sync wrt cell output (including reactive), but not cell add/dels and cell'content changes
panels have landed:
- general spark monitor
- defined variables and types
- chat room
cleaner docker build
added taucharts viz lib support
added -Dguava.version to support integration tools like cassandra connector from 1.5+

Again, we'd like to thank the community for their work and their support! YOU'RE ALL AWESOME!

v0.6.2

8 years ago

build information in the UI
better https support for web socket connections
use the presentation compiler for completion
fix restart kernel
log the server/spark forwarded to the browser's console
chart are plotting 25 entries by default (extendable using maxPoints) but this cap is changeable using a reactive HTML input
spark jobs' monitor/progress bar is now always live (still in progress, needs some UI hardening and enhancements)
graph plots are reactive
table chart using dynatable
HTTP proxy support for dependency managements
generic spark version support in a best effort way for any new spark versions (including nightly builds)
nightly build repos can be detected and injected with the spark.resolver.search jvm property set to true
presentation mode added, including UI tuning via
variables environment support in metadata: loca repo, vm arguments and spark configuration
Better DataFrame viz support
PivotChart tuning, including viz and state managment
support %% in the deps definition to take care of the used scala version
support the current spark version in the deps definition using _ like "org.apache.spark %% spark-streaming-kafka % _"
added user_custom.css for users' extensions or redefinings of the CSS
report the Spark UI link on the left hand side of the notebook
URL Query parameter action=recompute_now to automatically recompute everything at loading time
default logging less verbose
added CSV downloader from DataFrame capability (directly in HDFS using spark-csv)!
new C3 based widgets
new GeoChart widget -- support for JTS geometries, GeoJSON and String
new Flow for visual flow management using boxes and arrows (needs hardening and improvements)
UI cleaning (menubars, ...)
kernel auto starts can be disabled (useful for view only mode like presentation): autostartOnNotebookOpen in conf
UI shows when kernel isn't ready
died kernel are now reported throughout the UI too
added manager.notebooks.override to override and merge default values with metadata provided before starting a notebook
new examples notebooks:
- Machine Learning
- C3
- Geospatial
- Flow
more documentation (not enough...)

Special thanks to @vidma for his amazing work on many new and killing features! :clap: :clap: :clap:

v0.6.1

8 years ago

ADD_JARS support (add jars to context)
NB metadata saved at ok
fix 2.11 :dp and :cp
hide tachyon ui
YARN_CONF_DIR support
customArgs in metadata (application.conf, ...) → adding JVM arguments to spawned process for a notebook
spark 1.5.0 support
tachyon 0.7.1 integration for spark 1.5.0
added reactive slider + example in misc
old X and Y renaming of tuples' field name discarded, back to _1, _2
example of cassandra connector (@maasg)
reactive widgets.PivotChart support for simpler analysis of scala data
fixes fixes fixes

v0.6.0

8 years ago

a loooooot of fixes \o/
a loooooot of documentation including on how to install and run the spark notebook on distros and clusters (yarn, mapr, EMR, ...)
support for HADOOP_CONF_DIR and EXTRA_CLASSPATH to include spark cluster specific classpath entries, like hadoop conf dir, but also lzo jar and so on. This updates both the classpath of the notebook server and the notebooks processes.
the custom repos specified in the metadata or application.conf have an higher priority
support for spark 1.4.1
mesos is added to the docker distro
code is now run asynchronously, allowing the introduction of the flame button, that can cancel all running spark jobs
added many new notebooks, included @Data-Fellas ML and ADAM examples or anomaly detection by @radek1st
LOGO :-D
added :markdown, :javascript, :jpeg, :png, :latex, :svg, :pdf, :html, :plain, :pdf that support interpolation (using scala variables)
clusters can be deleted from the ui
spark packages repo is available by default
spark package format is now supported : groupId:artifactId:version
added with.parquet modifier to include parquet deps
spark.app.name uses the name of the notebook by default (easier to track in clusters)
Dynamic table renderer for DataFrame
Added a users sections in the README
Tachyon can be disabled by setting manager.tachyon.enabled to false
support for printing from the browser (CTRL+P)
added :ldp for local dependency definitions (so not added to spark context)
Graph (nodes-edges) can be plotted easily using the Node and Edge types → see viz/Graph Plots.snb
Geo data viz added using latlon data → see viz/Geo Data (Map).snb
Enhanced the twitter stream example to show tweets in a map
Enhanced the WISP examples including Histogram, BoxPot. Wisp plots can now be build using the lower api for Highchart
Adding the commons lib in the spark context to enable extended viz using spark jobs

v0.5.0

8 years ago

Main

Besides quite some fixes, this version brings two major features:

Session

When opening a notebook, a session is created allowing anybody to join it.

But mostly, the user can now close the tab and get back to the analysis later by reopening the notebook. Very helpful, f.i., when long processes are launched.

Tachyon

The tachyon support has been integrated with several functionalities:

connecting to a provided (configuration) tachyon cluster
starting a tachyon local embed cluster if none is available in the config
a small-ish UI-ish on the right hand side of the notebook panel that allows the user to browse the content hence, the persisted computations or even simply files

In the first and second points, all notebooks will be automatically configured (read the SparkContext) to use the available tachyon cluster without requiring any action from the user

Others

There are other stuffs that are worth mentioning:

the notebooks directory by default is now under the root folder anymore
the parquet deps have been discarded which could have been a pain with previous releases
the logs are now per session/notebook, so it's now even easier to track the job
the background logger (the yellow box on the left) has been removed since it didn't brought much info, but was interacting badly in some cases with the closure serializer...
support for https
more information is provided when errors occur, specially when the code is not complete (missing parenthesis)
execution time is now included in each result block
scala 2.10 now uses SBT to download deps, scala 2.11 is still using aether at the moment
HADOOP_CONF_DIR has to be used to pass the hadoop conf dir when using Yarn