⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
NEW! The jar for Spark 3.5 was added and is available for download.
Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.
Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb
Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/
Spark 3.0 and 3.1 are no longer supported as of RumbleDB 1.21, as they are no longer supported officially by the Spark team. Spark 3.4 is newly supported.
RumbleDB comes in 4 jars that you can pick from depending on your needs:
rumbledb-1.21.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.21.0-standalone.jar with Java 8 or 11. rumbledb-1.21.0-for-spark-3.X.jar (3.2, 3.3, 3.4) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.21.0-for-spark-3.X.jar
Improvements
Bugfixes
Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.
Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb
Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/
Spark 3.0 and 3.1 are no longer supported as of RumbleDB 1.20, as they are no longer supported officially by the Spark team.
RumbleDB comes in 4 jars that you can pick from depending on your needs:
rumbledb-1.20.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.20.0-standalone.jar with Java 8 or 11. rumbledb-1.20.0-for-spark-3.X.jar (3.2, 3.3) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.20.0-for-spark-3.X.jar
New features:
Bugfixes:
RumbleDB allows you to query data that does not fit in DataFrames with JSONiq.
Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb
Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/
RumbleDB comes in 4 jars that you can pick from depending on your needs:
Release notes:
Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/
RumbleDB comes in 4 jars that you can pick from depending on your needs:
Release notes:
Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/
java -jar rumbledb-1.17.0-standalone.jar run -q '1+1'
rather with spark-submit. Feedback is welcome! This is just experimental at this point and we will take it from there.Interim release.
Interim release.
Note that Spark 2.4.x is no longer maintained. We provide rumbledb-1.15.0-for-spark-2.jar only for legacy purposes for a smooth transition, and recommend instead using Spark 3.0.x or 3.1.x with the rumbledb-1.15.0.jar package.
Note that Spark 2.4.x is no longer maintained. We provide rumbledb-1.14.0-for-spark-2.jar only for legacy purposes for a smooth transition, and recommend instead using Spark 3.0.x or 3.1.x with the rumbledb-1.14.0.jar package.