Releases · RumbleDB/rumble

24 Oct 14:18

ghislainfourny

v1.22.0

90a7faa

RumbleDB 1.22.0 "Pyrenean oak" beta Latest

Latest

Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

Supported Java versions

The jars are compatible with Java 11. Support for Java 8 is dropped.

Supported Spark versions

Spark 3.2 and 3.3 are no longer supported as of RumbleDB 1.22, as they are no longer supported officially by the Spark team. Spark 3.4 and 3.5 are supported. Spark 4 is currently in preview and not supported yet by RumbleDB, but we are currently trying it out in order to support in future releases.

Jars

RumbleDB comes in 3 jars that you can pick from depending on your needs:

rumbledb-1.22.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.22.0-standalone.jar with Java 11.

rumbledb-1.22.0-for-spark-3.4-scala-2-12.jar, rumbledb-1.22.0-for-spark-3.5-scala-2-12.jar, and rumbledb-1.22.0-for-spark-3.5-scala-2-13.jar are smaller in size, do not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-....jar -q '1+1'

Improvements

Support for the W3C-standardized copy-modify-return expression as a more convenient way to transform JSON objects and arrays with the update syntax (insertion, deletion, replacement, renaming)
Support for the persistence of updates on objects and arrays read from the DeltaLake (with the same update syntax)
Support for scripting: variable assignments, while loops, applying updates in the middle of the execution with visible side effects (under snapshot semantics), statements, block statements, continue, break, exit returning.
Many performance improvements
Many bugfixes

Assets 7

16 May 13:09

ghislainfourny

v1.21.0

53f4df0

RumbleDB 1.21.0 "Hawthorn blossom" beta

NEW! The jar for Spark 3.5 was added and is available for download.

Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

Spark 3.0 and 3.1 are no longer supported as of RumbleDB 1.21, as they are no longer supported officially by the Spark team. Spark 3.4 is newly supported.

RumbleDB comes in 4 jars that you can pick from depending on your needs:

rumbledb-1.21.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.21.0-standalone.jar with Java 8 or 11.
rumbledb-1.21.0-for-spark-3.X.jar (3.2, 3.3, 3.4) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.21.0-for-spark-3.X.jar

Improvements

Automatically parallelizes range expressions with more than a million items with no need to call parallelize() any more.
some simple map expressions on homogeneous input are now faster (native SQL behind the scene).
general comparisons on equality are now considerably faster
reverse() is now more efficient and faster on homogeneous sequences
Fixed bug on equijoin involving homogeneous sequences
Add two functions jn:cosh and jn:sinh
Automatic optimization of general comparisons to value comparisons when it is detected that the sequences have at most one item (can be deactivated with --optimize-general-comparison-to-value-comparison on)
Better static type detection
It is now possible to force a sequential execution (without Spark) with --parallel-execution no. This also works with queries containing calls to parallelize() (which will be ineffective), json-doc(), and json-file() (which will simply stream-read from the disk). Other I/O functions (such as csv-file(), etc) will still involve Spark for reading, but immediately materialize for the rest of the execution.
It is now possible to deactivate Native Spark SQL execution (forcing a fallback to the use of UDFs by RumbleDB) with --native-execution no.
annotate expression (similar syntax to validate expression) allows directly annotating an item without checking for validity.
More static types are detected
Non-recursive functions are now automatically inlined for faster execution. This can be deactivated with --function-inlining no (reverting to behavior in previous versions)
TypeSwitch expressions now support DataFrame execution

Bugfixes

Fixed bug when reading longs from DataFrames
Fixed an issue with projection pushdowns in join queries
Fixed a few bugs with queries that navigate JSON in for clauses; they are compiled to native SQL whenever possible, but some chains were throwing errors (e.g., an array unboxing followed by object lookup)
Fixed a bug in which calling count() on a grouping variable did not return 1 when native SQL execution is activated
hexBinary and base64Binary values can now be used in order by clauses with parallel execution

Assets 8

07 Nov 12:57

ghislainfourny

v1.20.0

38e07ca

RumbleDB 1.20.0 "Honeylocust"

Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

Spark 3.0 and 3.1 are no longer supported as of RumbleDB 1.20, as they are no longer supported officially by the Spark team.

RumbleDB comes in 4 jars that you can pick from depending on your needs:

rumbledb-1.20.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.20.0-standalone.jar with Java 8 or 11.
rumbledb-1.20.0-for-spark-3.X.jar (3.2, 3.3) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.20.0-for-spark-3.X.jar

New features:

Open and query YAML files (also with multiple documents) with yaml-doc()
Serialize the output of your queries to YAML with --output-format yaml
General comparisons (existential quantification on large sequences) now work with very big sequences and are automatically pushed down to Spark.

Bugfixes:

Fixed an issue preventing reading Decimal types from Parquet with some precisions and ranges
Fixed a few bugs in static typing
Fixed a bug that didn't throw an error when using the concatenation operator || on sequences with more than one item

Assets 7

14 Jun 13:17

ghislainfourny

v1.19.0

cd6684b

RumbleDB 1.19.0 "Tipuana Tipu"

RumbleDB allows you to query data that does not fit in DataFrames with JSONiq.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

RumbleDB comes in 4 jars that you can pick from depending on your needs:

rumbledb-1.19.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.19.0-standalone.jar with Java 8 or 11.
rumbledb-1.19.0-for-spark-3.X.jar (3.0, 3.1, 3.2, 3.3) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.19.0-for-spark-3.X.jar

Release notes:

Fixed the bug with spaces in paths
Various fixes and enhancement
New functions repartition#2 to change the number of physical partitions, and binary-classification-metrics#3, binary-classification-metrics#4 for preparing ROC curves, PR curves to evaluation the output of ML pipelines.

Assets 9

12 Apr 14:55

ghislainfourny

v1.18.0

52a3424

RumbleDB 1.18.0 "Scarlet Ixora" beta

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

RumbleDB comes in 4 jars that you can pick from depending on your needs:

rumbledb-1.18.0-standalone.jar contains Spark already and can simply be run "out of the box" with java -jar rumbledb-1.18.0-standalone.jar with Java 8 or 11.
rumbledb-1.18.0-for-spark-3.X.jar (3.0, 3.1, 3.2) is smaller in size, does not contain Spark, and can be run in a corresponding, existing Spark environment either local (so you need to download and install Spark) or on a cluster (EMR with just a few clicks, etc) with spark-submit rumbledb-1.18.0-for-spark-3.X.jar

Release notes:

FLWOR expressions starting with a series of let are now better optimized and faster.
A warning with advice is issued in the command window if a group by is used in a FLWOR expression that starts with a let clause.
The shell will no longer exit when an error is thrown.
When a query cannot be executed in parallel, a more informative error message is output inviting the user to rewrite their query, instead of the raw Spark error.
When launching in shell or server mode, instructions are printed on the screen for next steps
Fixed crash in the execution of some where clauses when a join was not successfully detected and it falls back to linear execution
Support for context item declarations and passing an external context item value on the command line
By default, the date type no longer supports timezones (which are rarely used for this type, although supported by ISO 8601). This enables more optimizations (e.g., internal conversion to DataFrame DateType columns and export of datasets with dates to Parquet). Timezones on dates can be activated for those users who need them with a simple CLI argument (--dates-with-timezone yes).
Ctrl+C now elegantly exits the shell.

Assets 7

02 Feb 10:41

ghislainfourny

v1.17.0

02b7b3b

RumbleDB 1.17.0 "Cacao tree" beta

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

The CLI was extended with verbs (run, serve, repl) and single-dash shortcuts (-f for --output-format, etc). This is backward compatible.
Automatic internal conversion to DataFrames for FLWOR expressions executed in parallel when the statically inferred type is DataFrame-compatible.
Fixed bug that prevented calling a variable $type or lookup up a field called "type" without quotes.
Fixed but for projecting a sequence internally stored as a DataFrame to dynamically defined keys.
Fix some bugs with post-grouping count optimizations on let variables
Support for Spark 2.4, which is no longer maintained by the Spark team, is now dropped, but available on request. RumbleDB 1.17 supports Spark 3.0, 3.1 and 3.2.
plenty of smaller bug fixes
[Experimental] we also provide a jar that embeds Spark and does not require its installation (rumbledb-1.17.0-standalone.jar). It is for use on a local machine only (not a cluster) and works with java -jar rumbledb-1.17.0-standalone.jar run -q '1+1' rather with spark-submit. Feedback is welcome! This is just experimental at this point and we will take it from there.

Assets 7

09 Dec 10:14

ghislainfourny

v1.16.2

171fe57

RumbleDB 1.16.2 "Shagbark Hickory" beta Pre-release

Pre-release

Interim release.

Fix recursive view "input" issue.
Nicer message for out of memory errors and hint to use CLI parameters.
Reverted to Kryo 4 for Spark 3.2, which depends on Twitter Chill 0.10.0 using this version of Kryo in a way incompatible with Kryo5

Assets 6

06 Dec 08:46

ghislainfourny

v1.16.1

b703258

Rumble 1.16.1 "Shagbark Hickory" beta Pre-release

Pre-release

Interim release.

Fixed race condition issue with min() and max() called multiple times that led to possibly incorrect output.
The sum() and count() functions are now able to stream locally on very large (non parallelized) sequences.
Range expressions now support 64 bit integers as well (before this, an overflow happened)
The arrow syntax works for dynamic function calls, too, so in Rumble ML pipelines can also be called with a pipelining syntax: $training-set=>$my-transformer($params)=>my-estimator($params)
substring() was fixed to follow standard behavior even with exotic parameters (mostly returning an empty string in these cases)

Assets 6

02 Nov 11:07

ghislainfourny

v1.16.0

aaae4ee

RumbleDB 1.16.0 "Shagbark Hickory" beta Pre-release

Pre-release

new --query parameter for directly passing a query rather than a query path.
fixed a bug occurring with group by clauses on native DataFrames with complex aggregations
new --shell-filter parameters for modifying the way the output is shown in shell mode (e.g. --shell-filter 'jq . -S -C' for pretty printing)
new output formats: json (top-level strings will be quoted), tyson and possibility to indent with --output-format-options:indent yes
new JSound validator page at localhost:/jsound-validator.html
support for user-defined atomic types with JSound verbose syntax
fn:concat is now correctly in the fn namespace
When the materialization is reached and the count is unknow, it is no longer shown as the max long value.

Assets 5

13 Sep 13:50

ghislainfourny

v1.15.0

f483b45

RumbleDB 1.15.0 "Ivory Palm" Pre-release

Pre-release

Fixed jn:intersect#1 to always be run locally
General performance improvements for many expressions and iterators that return at most one item
New builtin functions supported: fn:min#2, fn:max#2, fn:unordered#1, fn:distinct-values#2, fn:index-of#3, fn:deep-equal#3, fn:string#0, fn:string#1, fn:substring-before#3, fn:substring-after#3, fn:string-length#0, fn:resolve-uri#1, fn:resolve-uri#2, fn:ends-width#3, fn:starts-width#3, fn:contains#3, , fn:normalize-space#0, fn:default-collation#0, fn:number#0, fn:implicit-timezone#0, fn:not#0, fn:static-base-uri#1, fn:dateTime#2, fn:false#0, fn:true#0
all JSONiq builtin types are now supported: newly supported are byte, dateTimeStamp, gDay, gMonth, gYear, gYearMonth, gMonthDay, int, long, negativeInteger, nonNegativeInteger, positiveInteger, nonPositiveInteger, unsignedInt, unsignedLong, unsignedByte, unsignedShort, short,
ceiling, floor, round, abs, round-half-to-even are now correctly in the fn namespace (not math) and all accept numeric values (instead of converting everything to doubles) and a few bugs have been fixed
support for open object types via the JSound verbose syntax (they are, of course, not implemented as DataFrames, but this makes no difference at the syntactic level except they cannot be used with ML estimators and transformers)
support for user-defined array types via the JSound verbose syntax, including subtypes
validation of atomic values is now correctly done by casting the lexical value (not the typed value) to the expected type.
Fixed serialization of NaN, double/float infinity, dates, etc (the quotes are now correctly included to make them JSON strings)
positive and negative zero (for double, float) now compare as equals in value/general comparison

Note that Spark 2.4.x is no longer maintained. We provide rumbledb-1.15.0-for-spark-2.jar only for legacy purposes for a smooth transition, and recommend instead using Spark 3.0.x or 3.1.x with the rumbledb-1.15.0.jar package.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supported Java versions

Supported Spark versions

Jars

Improvements

Releases: RumbleDB/rumble

RumbleDB 1.22.0 "Pyrenean oak" beta

Supported Java versions

Supported Spark versions

Jars

Improvements

RumbleDB 1.21.0 "Hawthorn blossom" beta

RumbleDB 1.20.0 "Honeylocust"

RumbleDB 1.19.0 "Tipuana Tipu"

RumbleDB 1.18.0 "Scarlet Ixora" beta

RumbleDB 1.17.0 "Cacao tree" beta

RumbleDB 1.16.2 "Shagbark Hickory" beta

Rumble 1.16.1 "Shagbark Hickory" beta

RumbleDB 1.16.0 "Shagbark Hickory" beta

RumbleDB 1.15.0 "Ivory Palm"