Interactive and Reactive Data Science using Scala and Spark.
like v0.8.2, just with a fix for Docker and Debian builds
Note: for
Spark < 2.0
, see v0.7.0-pre2
This is (likely) the last release which supports
Scala 2.10
.
Various fixes and improvements, among others:
Scala 2.11
support (code autocompletion; fixed kernel failures; improved customDeps
fetching)coursier
for faster deps resolving during the buildspark-mesos
lib by default (added -Dwith.mesos=true
build option)New features:
Removed:
:dp, :cp, :local-repo, :remote-repo
commands (use Edit -> Notebook metadata
instead)Rickshawts
, TauChart
, LinePlot
, Bokeh
(all superseeded by Plotly)THIS IS FOR SPARK PRE 2
Based on v0.7.0, it uses its fixes, optimization and most new features unless spark 2 specific (SparkSession
for instance).
Note: for
Spark < 2.0
, see v0.7.0-pre2
doc
folderfor spark <=1.6
, use this release or the stale/spark-1.6-and-older branch
Aside the stabilization with the all the bugs fixed, new features are:
-Dguava.version
to support integration tools like cassandra connector from 1.5+Again, we'd like to thank the community for their work and their support! YOU'RE ALL AWESOME!
spark.resolver.search
jvm property set to true
DataFrame
viz supportPivotChart
tuning, including viz and state managment%%
in the deps definition to take care of the used scala version_
like "org.apache.spark %% spark-streaming-kafka % _"
user_custom.css
for users' extensions or redefinings of the CSSaction=recompute_now
to automatically recompute everything at loading timeautostartOnNotebookOpen
in confmanager.notebooks.override
to override and merge default values with metadata provided before starting a notebookSpecial thanks to @vidma for his amazing work on many new and killing features! :clap: :clap: :clap:
misc
widgets.PivotChart
support for simpler analysis of scala datagroupId:artifactId:version
with.parquet
modifier to include parquet depsspark.app.name
uses the name of the notebook by default (easier to track in clusters)DataFrame
manager.tachyon.enabled
to false
:ldp
for local dependency definitions (so not added to spark context)Node
and Edge
types → see viz/Graph Plots.snb
viz/Geo Data (Map).snb
Besides quite some fixes, this version brings two major features:
When opening a notebook, a session is created allowing anybody to join it.
But mostly, the user can now close the tab and get back to the analysis later by reopening the notebook. Very helpful, f.i., when long processes are launched.
The tachyon support has been integrated with several functionalities:
In the first and second points, all notebooks will be automatically configured (read the SparkContext
) to use the available tachyon cluster without requiring any action from the user
There are other stuffs that are worth mentioning:
https
HADOOP_CONF_DIR
has to be used to pass the hadoop conf dir when using Yarn