05 Jul 12:36

1dfacb8

RumbleDB 1.14.0 "Acacia" beta Pre-release

Pre-release

Rumble now outputs error messages displaying the faulty line of code and pointing to the place of error.
Machine Learning estimators and models can now run at scale (in parallel) on very large amounts of data. This is automatically detected.
Many stability improvements in the Machine Learning library
Machine Learning Pipelines are now supported with stages given as function items
Static typing is now always done and used to optimize even more
Initial (experimental) support for user-defined types with the JSound Compact syntax. Types can be used everywhere builtin types can be used (instance of, treat as, type annotations for variables...).
New validate type expression to validate against user-defined types and (if the type is DF-compatible) to create object* instances as optimized dataframes.
Features must be assembled with the VectorAssembler transformer prior to being used with an estimator or transformer (for example, at the start of a pipeline). featuresCol and InputCol must specify the name (as a string) of the assembled feature vector field. This is now fully consistent with the Spark ML framework.

Note that Spark 2.4.x is no longer maintained. We provide rumbledb-1.14.0-for-spark-2.jar only for legacy purposes for a smooth transition, and recommend instead using Spark 3.0.x or 3.1.x with the rumbledb-1.14.0.jar package.

Assets 4

04 May 12:44

ghislainfourny

v1.12.0

e0c216f

Rumble 1.12.0 "Ashoka Tree" beta Pre-release

Pre-release

Fixed performance issue when a big for clause follows other small clauses
Fixed grouping and ordering of floats
Fixed a bug that prevented grouping with keys of incompatible types when hashcodes collided.
Experimental (and incomplete) support for XQuery 3.1 syntax (prefix queries with xquery version "3.1"; to activate)
project() calls are pushed down if the argument is structured (e.g., coming from parquet-file(), etc).
Performance improvements for round() and abs()
Variable references ($x) are resolved quicker
Support for general function types (including their signature) and type checking (including statically)
When iterating on schema-based data (Parquet, Avro, structured-json-file()...) in a FLWOR expression, some let, for, where, group-by and order-by clauses will be automatically faster if they only involve literals, variable references, object/array lookups, and value comparison (native mapping to Spark SQL)
Fixed several bugs in switch expressions
Switch expressions and conditional expressions can handle/forward structured data faster (underlying DataFrames)

Assets 4

03 Mar 12:32

ghislainfourny

v1.11.0

a317363

Rumble 1.11.0 "Banyan Tree" beta Pre-release

Pre-release

experimental support for static typing (--static-typing yes) following the W3C standard.
performance improvements in arithmetics, logics, comparison
spaces are now supported in paths to json-file()
HTTP URLs are now supported by unparsed-text() and unparsed-text-lines()
yearMonthDuration, dayTimeDurations, hexBinary, base64Binary can now be compared for inequality in addition to equality
performance improvements for comparison
the effective boolean value is now correctly taken in quantified expressions
quantified expressions now work in parallel as well (they leverage the FLWOR iterators)
support for floats
sum(), avg() are now pushed down and work on large homogeneous as well as heterogeneous sequences
stability improvements and improved conformance for comparison, arithmetics and casts
dayTimeDuration and yearMonthDuration can now be compared
all constructors are now available (semantics identical to cast as)
switch and index-of no longer throw an error for incompatible types, which now follows the standard
empty function bodies are now allowed (in which case it is considered to return the empty sequence)
variable names $null, $array, $object are now allowed
annotate() can now automatically cast whenever it makes sense, and is thus more flexible
the Item hierarchy is now flat, with a public Item interface available in the Rumble Java API, and individual classes providing the implementation, which should lead to a small performance boost with lighter method calls.
fixed an issue (null pointer exception) when an ordering key is always the empty sequence
constant predicate lookups with small numbers (<= materialization cap) are pushed down, e.g., json-file("...")[1]
general support at the parser level of any type QName. prefixes like xs: and js: are now accepted but remain optional (e.g., xs:integer, js:null).
an error is appropriately thrown if an order by expression evaluates to more than an item or a non-atomic item
builtin functions can now be called with fn:, jn: and math: prefixes as well (depending on their namespace). It is still, however, possible to refer to them without prefix, i.e., this is backward compatible.

The main jar is for Spark 3, but there is another jar for Spark 2.

Assets 4

04 Jan 11:16

ghislainfourny

v1.10.0

3b74ef5

Rumble 1.10.0 "Buttonwood" beta Pre-release

Pre-release

Fixed navigation issue with structured datasets when objects are nested in arrays.
Fixed a bug that prevented calling a user-defined functions repeatedly in a FLWOR expression in some cases
Any verbose messages are now printed to stderr, no longer stdout for those who want to pipeline the output in bash
Bugfixes in unary expressions (an error is now thrown for more than one item, and multiple unary signs, allowed by the spec are handled correctly)
Big integers can now be cast from strings
string() now returns serialized numbers consistent with JSON output
typeswitch now correctly matches the empty sequence type
improved stability for user-defined function calls consuming dataframe parameter. Seamless materialization for ? and 1 arities.
max() and min() are now pushed down to Spark and work on big sequences
+INF and INF (doubles) are now serialized to strings correctly
Fixed the division by 0 on doubles, to correctly produce +INF and -INF, and mod by 0 to produce NaN. idiv raises an error as per the spec.
It is now possible to build INF, -INF, und NaN double by casting from a string literal.
Fixed bug in the object lookup expression leading to a crash when the field to lookup depends on a variable, and the sequence of objects being looked up is partitioned on Spark. Same fix for array lookup expressions.
Fixed a crash happening in a FLWOR expression in a group-by clause executed in parallel, when none of the variables before and including this group clause is used anywhere in the remainder of the FLWOR expression.
Performance improvements in the processing of items.
Performance improvement for distinct-values call on heterogeneous sequences.
support for W3C-standard functions unparsed-text, unparsed-text-lines (in parallel) and parse-json (all with arity 1 for now)
Fixed a bug occasionally happening with JsonIter streaming by switching to another JSON parser (gson).

Assets 4

18 Nov 11:00

ghislainfourny

v1.9.1

f08749f

Rumble 1.9.1 "Ficus Bonsai" beta Pre-release

Pre-release

Interim release with the following fixes and improvements:

There is a new CLI parameter --deactivate-jsoniter-streaming to set to yes if there is any error regarding the JsonIter dependency, the library we use to parse JSON (the error in question being "com.jsoniter.spi.JsonException: javassist.CannotCompileException: by java.lang.ClassFormatError: class com.jsoniter.IterImpl cannot access its superclass com.jsoniter.IterImplForStreaming"). This flag deactivates streaming (i.e., avoids dynamic code generation by JsonIter) and avoids the error. This is a known issue with the Rumble docker but it never happened on our own machines. We are actively investigating why the Rumble docker has this issue. If you deactivate JsonIter streaming, though, this makes json-doc() unavailable after using json-file() in the same Rumble application (which is why we activate JsonIter streaming by default).
The public Rumble API (also accessible via the Rumble Maven dependency) now allows passing any lists of items as an external variable. You can thus gather the results of a query as a list of items, and put it back as the input of another query in Java as a host language.

Assets 4

28 Oct 15:41

ghislainfourny

v1.9.0

d13ef7c

Rumble 1.9.0 "Ficus Bonsai" beta Pre-release

Pre-release

Left-outer equi-joins with let clauses: if you have two large tabular datasets, Rumble can nest one into the other with just a few lines of code, and fast.
Inner equi-joins and generic joins with where clauses are detected.
Renamed --result-size to --materialization-size to avoid confusion, and adding more hints about --output-path for getting the complete output from a parallel query.
New CLI options --output-format and output-format-option:* for outputting structured output to other formats than JSON (Parquet, CSV...).
New CLI option --number-of-output-partitions to repartition the output as desired
New function local-text-file() to read a file as a sequence of string items, but without Spark parallelism (streaming instead). This makes Rumble faster for smaller files
Performance improvements for FLWOR queries on structured data (Avro, Parquet, structured JSON, CSV)...
Performance improvement for when parallelism is not used at all
Stability improvement for json-doc(), which will now also work after json-file() has been used.

Assets 4

21 Sep 08:55

ghislainfourny

v1.8.1

a08919e

Rumble 1.8.1 "Scots Pine" beta Pre-release

Pre-release

Interim release with small fixes

Improve performance of joins whenever possible (quadratic -> linear)
fixed a bug with non-exact averages with avg()

Note that Rumble is in beta. Use at your own risks.

Assets 4

04 Sep 13:02

ghislainfourny

v1.8.0

dce4f52

Rumble 1.8.0 "Scots pine" Pre-release

Pre-release

New features

Support for joining two large datasets; automatic detection of joins if a for expression is a predicate expression, and the left-hand side can be evaluated independently of the former clauses. The right-hand-side is the joining criterion. Left outer joins are also supported in parallel (allowing empty).
outer joins ("allowing empty" in a for clause) are now supported both locally and in parallel.
support for empty sequence order least/greatest prolog setter (for order by clauses)
positional variables in for clauses are now supported both locally and in parallel (except for large-scale joins).
arbitrary large integer literals are now supported (an error was thrown before beyond 32 bits)
json-file() and json-doc() can both read over HTTP
you can store your JSONiq modules on the Web and import them with an HTTP URL
you can store your queries on the Web and execute them via the Rumble command line with their URL
an error with the appropriate code is now thrown if a collation is specified that is not supported (the W3C standard requires support for at least the Unicode codepoint collation, which Rumble recognizes and supports).
It is now possible to specify a hostname in the server mode (--host), and to filter for specific URI prefixes for security reasons (--allowed-uri-prefixes)

Bugfixes

big integers are now seamlessly supported: no more overflows, and arbitrary large integer literals are accepted in JSONiq code
fixed display bugs in debug mode (--print-iterator-tree yes)
fixed an error with local group-by queries nested inside local FLWORs
fixed an error when counting items in a variable that was not a post-grouping variable, in parallelized FLWORs.
fixed a bug encountered when a local iteration followed by a parallel for clause produced, and unioned, several Spark jobs internally.

Important: The jar for Spark 3.0.0 does not have Laurelin (ROOT parser) support. We are waiting for a 3.0.0-compatible Laurelin release. If you need to query ROOT files, please use Spark 2.4.6.

Assets 4

07 Jul 12:03

ghislainfourny

v1.7.0

06979d2

Rumble 1.7.0 "Phoenix Atlantica" Pre-release

Pre-release

New milestone in our feature coverage with the following changes prioritized based on user requests.

New features

Rumble is available for Spark 2.4.x as well as for Spark 3.0.0 (pick the right jar). The version for Spark 3.0.0 cannot read ROOT files yet, as we are waiting for the corresponding Laurelin release.
library modules are now supported, in order to share and import functions and global variables. Like main modules, library modules can be stored on any file system including S3 or HDFS, which also enables sharing code within the institution (local HDFS system) or even worldwide (S3 or even HTTP).
support for the W3C-standard trace function, for outputting intermediate values to the log.
support for try-catch expressions to catch and handle dynamic errors
support (read-only) for HTTP scheme for reading query files, data, importing modules, etc.

Bugfixes

fixed a bug in position semantics in predicate expressions, so that it also works if the position is not a constant.
Bugfix: query files are now tested for EOF, and errors will now be thrown if there are extra characters after the complete JSONiq query.
it is now possible to define functions and variables in the local namespace, following the W3C standard
[BREAKING CHANGE] relative paths passed to input functions are now resolved correctly in a query if it is read from a file, i.e., according to the absolute query file location. In previous releases, relatives paths were resolved against the working directory. If you pass paths via external variables on the command line and (rightfully) expect them to be resolved against the working directory, declare the external variable with an "as anyURI" type annotation so Rumble knows your intent.
improvements in error messages when reading from and writing to file systems. Path resolution was also consolidated to provide the same experience everywhere.

Assets 4

02 Jun 15:07

ghislainfourny

v1.6.4

0ae68e7

Rumble 1.6.4 "Yucca" Pre-release

Pre-release

Interim release with bugfixes.

Support for DivisionByZero error code (div, mod).
Fixed a bug that sometimes led the Rumble shell to keep throwing the same error for subsequent queries
More informative error message when a range expression is not supplied with integers
Fix bug that prevented conditional expressions to be executable in parallel
New functions normalize-unicode and encode-for-uri
Support for running typeswitch in parallel

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features

Bugfixes

Releases: RumbleDB/rumble

RumbleDB 1.14.0 "Acacia" beta

Rumble 1.12.0 "Ashoka Tree" beta

Rumble 1.11.0 "Banyan Tree" beta

Rumble 1.10.0 "Buttonwood" beta

Rumble 1.9.1 "Ficus Bonsai" beta

Rumble 1.9.0 "Ficus Bonsai" beta

Rumble 1.8.1 "Scots Pine" beta

Rumble 1.8.0 "Scots pine"

Rumble 1.7.0 "Phoenix Atlantica"

New features

Bugfixes

Rumble 1.6.4 "Yucca"