Merge pull request #1090 from RumbleDB/Release15-2

Release15 2
RumbleDB · Sep 13, 2021 · f483b45 · f483b45
2 parents da24864 + 8e16436
commit f483b45
Show file tree

Hide file tree

Showing 11 changed files with 80 additions and 46 deletions.
diff --git a/docs/Function library.md b/docs/Function library.md
@@ -1,6 +1,6 @@
 # Function library
 
-We list here the functions supported by RumbleDB, and introduce them by means of examples. Highly detailed specifications can be found in the [underlying W3C standard](https://www.w3.org/TR/xpath-functions-30/#func-floor), unless the function is marked as specific to JSON or RumbleDB, in which case it can be found [here](http://www.jsoniq.org/docs/JSONiq/html-single/index.html#idm34604304).
+We list here the most important functions supported by RumbleDB, and introduce them by means of examples. Highly detailed specifications can be found in the [underlying W3C standard](https://www.w3.org/TR/xpath-functions-30/#func-floor), unless the function is marked as specific to JSON or RumbleDB, in which case it can be found [here](http://www.jsoniq.org/docs/JSONiq/html-single/index.html#idm34604304). JSONiq and RumbleDB intentionally do not support builtin functions on XML nodes, NOTATION or QNames. RumbleDB supports almost all other W3C-standardized functions, please contact us if you are still missing one.
 
 For the sake of ease of use, all W3C standard builtin functions and JSONiq builtin functions are in the
 RumbleDB namespace, which is the default function namespace and does not require any prefix in front of function names.

diff --git a/docs/Getting started.md b/docs/Getting started.md
@@ -10,7 +10,7 @@ Users who love the command line can install Spark with a package management syst
 
 However, it is also straightforward to directly [download it](https://spark.apache.org/downloads.html), unpack it, and add the subdirectory "bin" within the unpacked directory to the PATH variable, as well as the location of the unpacked directory to SPARK_HOME.
 
-We recommend installing either Spark 2.4.7, or Spark 3.0.2 (we provide a RumbleDB jar for each one of these, the default is Spark 3).
+We recommend installing either Spark 3.0.3, or Spark 3.1.2 (we also provide a RumbleDB jar for Spark 2 for legacy purposes, however it is not recommended to use it for new projects).
 
 You can test that Spark was correctly installed with:
 
@@ -23,7 +23,7 @@ Another important comment: if you use Spark 2.4.x, you need to make sure that yo
 
 ### Download RumbleDB
 
-RumbleDB is just a download with no installation. In order to run RumbleDB, you simply need to download the .jar file from the [download page](https://github.com/RumbleDB/rumble/releases) and put it in a directory of your choice (for example, right besides your data). If you use Spark 3.0.2, you can use the default jar. If you use Spark 2.4.x, make sure to use the corresponding jar (for-spark-2) and to replace the jar name accordingly in all our instructions.
+RumbleDB is just a download with no installation. In order to run RumbleDB, you simply need to download the .jar file from the [download page](https://github.com/RumbleDB/rumble/releases) and put it in a directory of your choice (for example, right besides your data). If you use Spark 3, you can use the default jar. If you use Spark 2, make sure to use the corresponding jar (for-spark-2) and to replace the jar name accordingly in all our instructions.
 
 ### Create some data set
 
@@ -43,14 +43,14 @@ Create, in the same directory as RumbleDB to keep it simple, a file data.json an
 
 In a shell, from the directory where the RumbleDB .jar lies, type, all on one line:
 
-    spark-submit rumbledb-1.14.0.jar --shell yes
+    spark-submit rumbledb-1.15.0.jar --shell yes
                  
 The RumbleDB shell appears:
 
         ____                  __    __     ____  ____ 
        / __ \__  ______ ___  / /_  / /__  / __ \/ __ )
       / /_/ / / / / __ `__ \/ __ \/ / _ \/ / / / __  |  The distributed JSONiq engine
-     / _, _/ /_/ / / / / / / /_/ / /  __/ /_/ / /_/ /   1.14.0 "Acacia" beta
+     / _, _/ /_/ / / / / / / /_/ / /  __/ /_/ / /_/ /   1.15.0 "Ivory Palm
     /_/ |_|\__,_/_/ /_/ /_/_.___/_/\___/_____/_____/  
 
     Master: local[*]

diff --git a/docs/HTTPServer.md b/docs/HTTPServer.md
@@ -4,7 +4,7 @@
 
 RumbleDB can be run as an HTTP server that listens for queries. In order to do so, you can use the --server and --port parameters:
 
-    spark-submit rumbledb-1.14.0.jar --server yes --port 8001
+    spark-submit rumbledb-1.15.0.jar --server yes --port 8001
 
 This command will not return until you force it to (Ctrl+C on Linux and Mac). This is because the server has to run permanently to listen to incoming requests.
 
@@ -94,19 +94,19 @@ Then there are two options
 - Connect to the master with SSH with an extra parameter for securely tunneling the HTTP connection (for example `-L 8001:localhost:8001` or any port of your choosing)
 - Download the RumbleDB jar to the master node
 
-    wget https://github.com/RumbleDB/rumble/releases/download/v1.12.0/rumbledb-1.14.0.jar
+    wget https://github.com/RumbleDB/rumble/releases/download/v1.12.0/rumbledb-1.15.0.jar
 
 - Launch the HTTP server on the master node (it will be accessible under `http://localhost:8001/jsoniq`).
 
-    spark-submit rumbledb-1.14.0.jar --server yes --port 8001
+    spark-submit rumbledb-1.15.0.jar --server yes --port 8001
 
 - And then use Jupyter notebooks in the same way you would do it locally (it magically works because of the tunneling)
 
 ### With the EC2 hostname
 
 There is also another way that does not need any tunnelling: you can specify the hostname of your EC2 machine (copied over from the EC2 dashboard) with the --host parameter. For example, with the placeholder <ec2-hostname>:
 
-    spark-submit rumbledb-1.14.0.jar --server yes --port 8001 --host <ec2-hostname>
+    spark-submit rumbledb-1.15.0.jar --server yes --port 8001 --host <ec2-hostname>
 
 You also need to make sure in your EMR security group that the chosen port (e.g., 8001) is accessible from the machine in which you run your Jupyter notebook. Then, you can point your Jupyter notebook on this machine to `http://<ec2-hostname>:8001/jsoniq`.
 

diff --git a/docs/JSONiq.md b/docs/JSONiq.md
@@ -4,7 +4,7 @@ RumbleDB relies on the JSONiq language.
 
 ## JSONiq reference
 
-The complete specification can be found [here](http://www.jsoniq.org/docs/JSONiq/webhelp/index.html) on the [JSONiq.org](http://www.jsoniq.org) website. Note that it is not fully implemented yet (see below).
+The complete specification can be found [here](http://www.jsoniq.org/docs/JSONiq/webhelp/index.html) on the [JSONiq.org](http://www.jsoniq.org) website. The implementation is now in a very advanced stage and there remain only few unsupported core JSONiq features.
 
 ## JSONiq tutorial
 
@@ -65,7 +65,9 @@ When an expression does not support pushdown, it will materialize automaticaly.
 
 ## External global variables.
 
-Prologs with user-defined functions and global variables are now fully supported. Global external variables with string values are supported (use "--variable:foo bar" on the command line to assign values to them).
+Prologs with user-defined functions and global variables are now fully supported. Global external variables are supported (use "--variable:foo bar" on the command line to assign values to them). If the declared type is not string, then the literal supplied
+on the command line is cast. If the declared type is anyURI, the path supplied on the command line is also resolved against
+the working directory to an absolute URI. Thus, anyURI should be used to supply paths dynamically through an external variable.
 
 
 ## Library modules
@@ -105,42 +107,44 @@ try { 1 div 0 } catch FOAR0001 { "Division by zero!" }
 
 ### Supported types
 
-The type system is not quite complete yet, although a lot of progress was made. Below is a complete list of JSONiq types and their support status. All builtin types are in the default type namespace, so that no prefix is needed.
+The JSONiq type system is fully supported. Below is a complete list of JSONiq types and their support status. All builtin types are in the default type namespace, so that no prefix is needed. These types are defined in the XML Schema standard. Note that some
+types specific to XML (e.g., NOTATION, NMTOKENS, NMTOKEN, ID, IDREF, ENTITY, etc) are not part of the JSONiq standard and
+not supported by RumbleDB.
 
 |  Type | Status |
 |-------|--------|
 | atomic | supported |
 | anyURI | supported |
 | base64Binary | supported |
 | boolean | supported |
-| byte | not supported |
+| byte | supported |
 | date | supported |
 | dateTime | supported |
-| dateTimeStamp | not supported |
+| dateTimeStamp | supported |
 | dayTimeDuration | supported |
 | decimal | supported |
 | double | supported |
 | duration | supported |
 | float | supported |
-| gDay | not supported |
-| gMonth | not supported |
-| gYear | not supported |
-| gYearMonth | not supported |
+| gDay | supported |
+| gMonth | supported |
+| gYear | supported |
+| gYearMonth | supported |
 | hexBinary | supported |
-| int | not supported |
+| int | supported |
 | integer | supported |
-| long | not supported |
-| negativeInteger | not supported |
-| nonPositiveInteger | not supported |
-| nonNegativeInteger | not supported |
-| positiveInteger | not supported |
-| short | not supported |
+| long | supported |
+| negativeInteger | supported |
+| nonPositiveInteger | supported |
+| nonNegativeInteger | supported |
+| positiveInteger | supported |
+| short | supported |
 | string | supported |
 | time | supported |
-| unsignedByte | not supported |
-| unsignedInt | not supported |
-| unsignedLong | not supported |
-| unsignedShort | not supported |
+| unsignedByte | supported |
+| unsignedInt | supported |
+| unsignedLong | supported |
+| unsignedShort | supported |
 | yearMonthDuration | supported |
 
 ## Unsupported/Unimplemented features (beta release)
@@ -161,15 +165,16 @@ Window clauses are not supported, because they are not compatible with the Spark
 
 ### Function types
 
-Function type syntax is not supported yet, but is planned. Function coercion is thus also not implemented yet, but is planned.
+Function type syntax is supported.
 
 Function annotations are not supported (%public, %private...), but this is planned.
 
 ### Builtin functions
 
-A large number of JSONiq functions in the library are now supported (see function documentation), and the remaining ones (typically, the fancier versions with an extra tuning parameter) get added continuously. Please take a look at the function library documentation to know which functions are available.
+Most JSONiq and XQuery builtin functions are now supported (see function documentation), except XML-specific functions. A few are
+still missing, do not hesitate to reach out if you need them.
 
-Constructors for atomic types are not implemented yet. Please use the "cast as" expression instead, which are equivalent.
+Constructors for atomic types are fully supported.
 
 Buitin functions cannot yet be used with named function reference expressions (example: concat#2).
 

diff --git a/docs/Run on a cluster.md b/docs/Run on a cluster.md
@@ -5,21 +5,21 @@ simply by modifying the command line parameters as documented [here for spark-su
 
 If the Spark cluster is running on yarn, then the --master option can be changed from local[\*] to yarn compared to the getting started guide. Most of the time, though (e.g., on Amazon EMR), it needs not be specified, as this is already set up in the environment.
 
-    spark-submit rumbledb-1.14.0.jar --shell yes
+    spark-submit rumbledb-1.15.0.jar --shell yes
                  
 or explicitly:
 
-    spark-submit --master yarn --deploy-mode client rumbledb-1.14.0.jar --shell yes
+    spark-submit --master yarn --deploy-mode client rumbledb-1.15.0.jar --shell yes
 
 You can also adapt the number of executors, etc.
 
     spark-submit --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 rumbledb-1.14.0.jar --shell yes
+                 rumbledb-1.15.0.jar --shell yes
 
 The size limit for materialization can also be made higher with --materialization-cap (the default is 200). This affects the number of items displayed on the shells as an answer to a query, as well as any materializations happening within the query with push-down is not supported. Warnings are issued if the cap is reached.
 
     spark-submit --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 rumbledb-1.14.0.jar
+                 rumbledb-1.1r.0.jar
                  --shell yes --materialization-cap 10000
 
 ## Creation functions
@@ -59,15 +59,15 @@ Note that by default only the first 1000 items in the output will be displayed o
 RumbleDB also supports executing a single query from the command line, reading from HDFS and outputting the results to HDFS, with the query file being either local or on HDFS. For this, use the --query-path, --output-path and --log-path parameters.
 
     spark-submit --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 rumbledb-1.14.0.jar
+                 rumbledb-1.15.0.jar
                  --query-path "hdfs:///user/me/query.jq"
                  --output-path "hdfs:///user/me/results/output"
                  --log-path "hdfs:///user/me/logging/mylog"
 
 The query path, output path and log path can be any of the supported schemes (HDFS, file, S3, WASB...) and can be relative or absolute.
 
     spark-submit --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 rumbledb-1.14.0.jar
+                 rumbledb-1.15.0.jar
                  --query-path "/home/me/my-local-machine/query.jq"
                  --output-path "/user/me/results/output"
                  --log-path "hdfs:///user/me/logging/mylog"

diff --git a/docs/Types.md b/docs/Types.md
@@ -1,12 +1,12 @@
 # User-defined types
 
-RumbleDB now supports user-defined types in a limited fashion.
+RumbleDB now supports user-defined array and object types both with the JSound compact syntax and the JSound verbose syntax.
 
 ## JSound Schema Compact syntax
 
 RumbleDB user-defined types can be defined with the JSound syntax. A tutorial for the JSound syntax can be found [here](https://github.com/ghislainfourny/jsound-tutorial).
 
-For now, RumbleDB only allows the definition of user-defined types for objects and has initial, experimemental, limited support of JSound. Also, the @ (primary key) and ? (nullable) characters are not supported at this point. The implementation is still experimental and bugs are still expected, which we will appreciate to be informed of.
+For now, RumbleDB only allows the definition of user-defined types for objects and arrays. User-defined atomic types and union types will follow soon. Also, the @ (primary key) and ? (nullable) characters are not supported at this point. The implementation is still experimental and bugs are still expected, which we will appreciate to be informed of.
 
 ## Type declaration
 
@@ -247,6 +247,35 @@ In fact, RumbleDB will internally convert the sequence of objects to a Spark Dat
 
 In other words, the JSound Compact Schema Syntax is perfect for defining DataFrames schema!
 
+## Verbose syntax
+
+For advanced JSound features, such as open object types or subtypes, the verbose syntax must be used, like so: 
+
+```
+declare type local:x as jsound verbose {
+  "kind" : "object",
+  "baseType" : "object",
+  "content" : [
+    { "name" : "foo", "type" : "integer" }
+  ],
+  "closed" : false
+};
+
+declare type local:y as jsound verbose {
+  "kind" : "object",
+  "baseType" : "local:x",
+  "content" : [
+    { "name" : "bar", "type" : "date" }
+  ],
+  "closed" : true
+};
+```
+
+The JSound type system, as its name indicates, is sound: you can only make subtypes more restrictive than the super type. The
+complete specification of both syntaxes is available on the [JSound website](https://www.jsound-spec.org/).
+
+In the feature, RumbleDB will support user-defined atomic types and union types via the verbose syntax.
+
 ## What's next?
 
 Once you have validated your data as a dataframe with a user-defined type, you are all set to use the RumbleDB ML Machine Learning library and feed it through ML pipelines!
diff --git a/docs/install.md b/docs/install.md
@@ -64,7 +64,7 @@ After successful completion, you can check the `target` directory, which should
 
 The most straightforward to test if the above steps were successful is to run the RumbleDB shell locally, like so:
 
-    $ spark-submit target/rumbledb-1.14.0.jar --shell yes
+    $ spark-submit target/rumbledb-1.15.0.jar --shell yes
 
 The RumbleDB shell should start:
 
@@ -73,7 +73,7 @@ The RumbleDB shell should start:
         ____                  __    __     ____  ____ 
        / __ \__  ______ ___  / /_  / /__  / __ \/ __ )
       / /_/ / / / / __ `__ \/ __ \/ / _ \/ / / / __  |  The distributed JSONiq engine
-     / _, _/ /_/ / / / / / / /_/ / /  __/ /_/ / /_/ /   1.14.0 "Acacia" beta
+     / _, _/ /_/ / / / / / / /_/ / /  __/ /_/ / /_/ /   1.15.0 "Ivory Palm" beta
     /_/ |_|\__,_/_/ /_/ /_/_.___/_/\___/_____/_____/  
 
     Master: local[2]

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -1,4 +1,4 @@
-site_name: RumbleDB 1.14 "Acacia" beta
+site_name: RumbleDB 1.15 "Ivory Palm" beta
 pages:
   - '1. Documentation home': 'index.md'
   - '2. Getting started': 'Getting started.md'

diff --git a/pom.xml b/pom.xml
@@ -26,7 +26,7 @@
 
     <groupId>com.github.rumbledb</groupId>
     <artifactId>rumbledb</artifactId>
-    <version>1.14.0</version>
+    <version>1.15.0</version>
     <packaging>jar</packaging>
     <name>RumbleDB</name>
     <description>A JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.</description>

diff --git a/src/main/resources/assets/banner.txt b/src/main/resources/assets/banner.txt
@@ -1,6 +1,6 @@
     ____                  __    __     ____  ____ 
    / __ \__  ______ ___  / /_  / /__  / __ \/ __ )
   / /_/ / / / / __ `__ \/ __ \/ / _ \/ / / / __  |  The distributed JSONiq engine
- / _, _/ /_/ / / / / / / /_/ / /  __/ /_/ / /_/ /   1.14.0 "Acacia" beta
+ / _, _/ /_/ / / / / / / /_/ / /  __/ /_/ / /_/ /   1.15.0 "Ivory Palm" beta
 /_/ |_|\__,_/_/ /_/ /_/_.___/_/\___/_____/_____/  
 
diff --git a/src/main/resources/assets/public.html b/src/main/resources/assets/public.html
@@ -38,6 +38,6 @@ <h1>JSONiq query</h1>
     <h1>Results</h1>
     <textarea id="result" placeholder="Results" rows="20" cols="80">
     </textarea>
-    <button id="submit">Evaluate</button>
+    <button id="submit">Evaluate with RumbleDB 1.15.0</button>
   </body>
 </html>