Spark Notebook Versions Save

Interactive and Reactive Data Science using Scala and Spark.

v0.8.3

6 years ago

like v0.8.2, just with a fix for Docker and Debian builds

v0.8.2

6 years ago

Note: for Spark < 2.0, see v0.7.0-pre2

This is (likely) the last release which supports Scala 2.10.

Various fixes and improvements, among others:

  • redesigned UI to be more user-friendly (minimalistic UX, cell context menu, improve sidebar)
  • better Scala 2.11 support (code autocompletion; fixed kernel failures; improved customDeps fetching)
  • Use coursier for faster deps resolving during the build
  • code cleanup and other stability fixes
  • ease usage of Mesos in Spark 2.1: it now includes the spark-mesos lib by default (added -Dwith.mesos=true build option)

New features:

  • SBT project generation from a notebook (experimental)
  • Notebook edit versioning, and storage on Git (experimental)
  • Viewer-only mode - a build option which makes notebooks not editable

Removed:

  • removed :dp, :cp, :local-repo, :remote-repo commands (use Edit -> Notebook metadata instead)
  • removed old plotting libs: Rickshawts , TauChart , LinePlot , Bokeh (all superseeded by Plotly)

v0.7.0-pre2

7 years ago

THIS IS FOR SPARK PRE 2

Based on v0.7.0, it uses its fixes, optimization and most new features unless spark 2 specific (SparkSession for instance).

v0.7.0

7 years ago

Note: for Spark < 2.0, see v0.7.0-pre2

  • Add spark 2 support
  • Many fixes for better stability (more lenient for user input, avoid kernels crash)
  • Lot of optimization for the viz, also replaced most Dimple with C3.
  • Introducing Plotly.js wrappers
  • Better debian support
  • Greater download as Markdown as zip with charts rendered as PNG referred in a images folder
  • Better doc available at all time in the doc folder
  • Cell dirtiness detection based on variables dependency graph
  • New default port to 9001 to avoid conflict with HDFS
  • Removed Wisp and Highcharts (in favor of plotly.js)
  • Code cleanup

v0.6.4

7 years ago

for spark <=1.6, use this release or the stale/spark-1.6-and-older branch

v0.6.3

8 years ago

Aside the stabilization with the all the bugs fixed, new features are:

  • improvement of the PivotChart
  • improvement of completion with type args and more
  • better sampling for automatic/default plots
  • added tests and travis
  • spark jobs are tracked by cells, cells have now ids
  • hardened the observables init
  • improved scala 2.11 support
  • improve Flow widget, added Custom box taking scala code directly as logic
  • job for a cell can be cancelled
  • read_only mode
  • notebooks are now sync wrt cell output (including reactive), but not cell add/dels and cell'content changes
  • panels have landed:
    • general spark monitor
    • defined variables and types
    • chat room
  • cleaner docker build
  • added taucharts viz lib support
  • added -Dguava.version to support integration tools like cassandra connector from 1.5+

Again, we'd like to thank the community for their work and their support! YOU'RE ALL AWESOME!

v0.6.2

8 years ago
  • build information in the UI
  • better https support for web socket connections
  • use the presentation compiler for completion
  • fix restart kernel
  • log the server/spark forwarded to the browser's console
  • chart are plotting 25 entries by default (extendable using maxPoints) but this cap is changeable using a reactive HTML input
  • spark jobs' monitor/progress bar is now always live (still in progress, needs some UI hardening and enhancements)
  • graph plots are reactive
  • table chart using dynatable
  • HTTP proxy support for dependency managements
  • generic spark version support in a best effort way for any new spark versions (including nightly builds)
  • nightly build repos can be detected and injected with the spark.resolver.search jvm property set to true
  • presentation mode added, including UI tuning via
  • variables environment support in metadata: loca repo, vm arguments and spark configuration
  • Better DataFrame viz support
  • PivotChart tuning, including viz and state managment
  • support %% in the deps definition to take care of the used scala version
  • support the current spark version in the deps definition using _ like "org.apache.spark %% spark-streaming-kafka % _"
  • added user_custom.css for users' extensions or redefinings of the CSS
  • report the Spark UI link on the left hand side of the notebook
  • URL Query parameter action=recompute_now to automatically recompute everything at loading time
  • default logging less verbose
  • added CSV downloader from DataFrame capability (directly in HDFS using spark-csv)!
  • new C3 based widgets
  • new GeoChart widget -- support for JTS geometries, GeoJSON and String
  • new Flow for visual flow management using boxes and arrows (needs hardening and improvements)
  • UI cleaning (menubars, ...)
  • kernel auto starts can be disabled (useful for view only mode like presentation): autostartOnNotebookOpen in conf
  • UI shows when kernel isn't ready
  • died kernel are now reported throughout the UI too
  • added manager.notebooks.override to override and merge default values with metadata provided before starting a notebook
  • new examples notebooks:
    • Machine Learning
    • C3
    • Geospatial
    • Flow
  • more documentation (not enough...)

Special thanks to @vidma for his amazing work on many new and killing features! :clap: :clap: :clap:

v0.6.1

8 years ago
  • ADD_JARS support (add jars to context)
  • NB metadata saved at ok
  • fix 2.11 :dp and :cp
  • hide tachyon ui
  • YARN_CONF_DIR support
  • customArgs in metadata (application.conf, ...) → adding JVM arguments to spawned process for a notebook
  • spark 1.5.0 support
  • tachyon 0.7.1 integration for spark 1.5.0
  • added reactive slider + example in misc
  • old X and Y renaming of tuples' field name discarded, back to _1, _2
  • example of cassandra connector (@maasg)
  • reactive widgets.PivotChart support for simpler analysis of scala data
  • fixes fixes fixes

v0.6.0

8 years ago
  • a loooooot of fixes \o/
  • a loooooot of documentation including on how to install and run the spark notebook on distros and clusters (yarn, mapr, EMR, ...)
  • support for HADOOP_CONF_DIR and EXTRA_CLASSPATH to include spark cluster specific classpath entries, like hadoop conf dir, but also lzo jar and so on. This updates both the classpath of the notebook server and the notebooks processes.
  • the custom repos specified in the metadata or application.conf have an higher priority
  • support for spark 1.4.1
  • mesos is added to the docker distro
  • code is now run asynchronously, allowing the introduction of the flame button, that can cancel all running spark jobs
  • added many new notebooks, included @Data-Fellas ML and ADAM examples or anomaly detection by @radek1st
  • LOGO :-D
  • added :markdown, :javascript, :jpeg, :png, :latex, :svg, :pdf, :html, :plain, :pdf that support interpolation (using scala variables)
  • clusters can be deleted from the ui
  • spark packages repo is available by default
  • spark package format is now supported : groupId:artifactId:version
  • added with.parquet modifier to include parquet deps
  • spark.app.name uses the name of the notebook by default (easier to track in clusters)
  • Dynamic table renderer for DataFrame
  • Added a users sections in the README
  • Tachyon can be disabled by setting manager.tachyon.enabled to false
  • support for printing from the browser (CTRL+P)
  • added :ldp for local dependency definitions (so not added to spark context)
  • Graph (nodes-edges) can be plotted easily using the Node and Edge types → see viz/Graph Plots.snb
  • Geo data viz added using latlon data → see viz/Geo Data (Map).snb
  • Enhanced the twitter stream example to show tweets in a map
  • Enhanced the WISP examples including Histogram, BoxPot. Wisp plots can now be build using the lower api for Highchart
  • Adding the commons lib in the spark context to enable extended viz using spark jobs

v0.5.0

8 years ago

Main

Besides quite some fixes, this version brings two major features:

Session

When opening a notebook, a session is created allowing anybody to join it.

But mostly, the user can now close the tab and get back to the analysis later by reopening the notebook. Very helpful, f.i., when long processes are launched.

Tachyon

The tachyon support has been integrated with several functionalities:

  • connecting to a provided (configuration) tachyon cluster
  • starting a tachyon local embed cluster if none is available in the config
  • a small-ish UI-ish on the right hand side of the notebook panel that allows the user to browse the content hence, the persisted computations or even simply files

In the first and second points, all notebooks will be automatically configured (read the SparkContext) to use the available tachyon cluster without requiring any action from the user

Others

There are other stuffs that are worth mentioning:

  • the notebooks directory by default is now under the root folder anymore
  • the parquet deps have been discarded which could have been a pain with previous releases
  • the logs are now per session/notebook, so it's now even easier to track the job
  • the background logger (the yellow box on the left) has been removed since it didn't brought much info, but was interacting badly in some cases with the closure serializer...
  • support for https
  • more information is provided when errors occur, specially when the code is not complete (missing parenthesis)
  • execution time is now included in each result block
  • scala 2.10 now uses SBT to download deps, scala 2.11 is still using aether at the moment
  • HADOOP_CONF_DIR has to be used to pass the hadoop conf dir when using Yarn