Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New rnote file format, better compression and serialization, atomic file saving #1177

Draft
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

anesthetice
Copy link
Contributor

@anesthetice anesthetice commented Aug 14, 2024

I've extended this pull request with #1150 to share code in common, and #1170 as they were in conflict.

The overall goal of this PR is to improve the handling of Rnote save files, this is accomplished mainly in three ways:

  1. A more flexible, backwards-compatible Rnote file format
    (fixes New file format (stabilization) #1173)
  2. Better default compression and serialization (zstd + bitcode instead of gzip + json)
  3. Atomic saves
    (fixes More resilient file saving by a two-stage file save process #1128)

1 - the Rnote file format

rnotefileformat

With regards to what was discussed on this issue here #1173:

  • backwards compatible

  • this format supports (and can be extended to support)

    • multiple compression methods
    • multiple serialization method
    • additional metadata
  • the file's version and metadata are kept separate from the (potentially) compressed data

  • still unsure as to how well this new format would support partial file loading, but I believe it could be possible by adding extra header info (such as an array containing the start of each sub-bloc in the body of the file)

2 - compression and serialization

See https://github.com/anesthetice/rnote-compression-benchmarking

In addition to the default compression and serialization method changes, users can also select their desired compression level, from "Very Low" to "Very High" in the "Document" section of the settings, Rnote re-uses the previously selected compression level for new files, and as for existent files, maintains their compression level unless explicitly changed by the user.

For debugging purposes, the serialization and compression methods can be set to non-standard variants, and can be maintained even after re-saving the file by setting method_lock to true

These features are achieved by using save preferences (EngineSnapshot <-> Engine <-> EngineConfig)

3 - atomic saves

Current

  • the save file is simply overwritten

Propsal:

  • write to temporary file
  • check temporary file
  • replace save file with temporary one

4 - other cool stuff

rnote-cli mutate

Ever wanted to further compress Rnote files you don't often use but still wish to have direct access to?
rnote-cli mutate --compression-method zstd --compression-level 19 --serialization-method bitcode $(find . -name "*.rnote" 2>/dev/null)

Want an uncompressed Rnote file encoded in json?
rnote-cli mutate --compression-method none --serialization-method json --lock file.rnote

5 - pinboard

  • json:
    • uncompressed: 46 MB
    • compressed: 5.5 MB
  • cbor:
    • uncompressed: 39MB
    • compressed: 6.8 MB
  • bincode:
    • uncompressed: 20.9 MB
    • compressed: 6.4 MB
  • postcard:
    • uncompressed: 18.6 MB
    • compressed: 6.3 MB
  • toml:
    • no
  • bitcode (using serialize and not encode):
    • uncompressed: 17.4 MB
    • compressed: 4.9 MB

Deserialized from Json in 0.822; Serialized to 46.201438 MB, in 0.518
Deserialized from Cbor in 0.229; Serialized to 39.910991 MB, in 0.073
Deserialized from Bincode in 0.027; Serialized to 20.993607 MB, in 0.016
Deserialized from Toml in 5.255; Serialized to 112.392787 MB, in 7.203
Deserialized from Bitcode in 0.020; Serialized to 17.432074 MB, in 0.028
Deserialized from Pot in 0.120; Serialized to 28.007348 MB, in 0.049
Deserialized from Postcard in 0.017; Serialized to 18.668699 MB, in 0.017

@anesthetice anesthetice marked this pull request as ready for review August 20, 2024 19:45
@anesthetice anesthetice changed the title More versatile rnote file format, issue 1173 New rnote file format, better compression and serialization, atomic file saving Aug 25, 2024
@anesthetice
Copy link
Contributor Author

demo 1 - backwards compatibility, size ratio, persistent compression level

rnote_demo_1.mp4

demo 2 - rnote-cli mutate (atomic file saving is also noticeable)

rnote_demo_2.mp4

@anesthetice
Copy link
Contributor Author

I went through all of my code yesterday, I'd say this is ready for review, still open to suggestions or improvements of course

@anesthetice
Copy link
Contributor Author

Getting some weird issues with bitcode relating to the packing that aren't present with json, might have to reconsider using bitcode sadly..

@anesthetice anesthetice marked this pull request as draft October 9, 2024 11:02
@RayJW
Copy link
Contributor

RayJW commented Oct 10, 2024

Getting some weird issues with bitcode relating to the packing that aren't present with json, might have to reconsider using bitcode sadly..

What exactly are those issues, if I may ask? I see you're using bitcode with serde which seems to have issues for the time being. Not sure if that is related, though. It might be that some serde_json types cause problems according to the wiki.

@anesthetice
Copy link
Contributor Author

Getting some weird issues with bitcode relating to the packing that aren't present with json, might have to reconsider using bitcode sadly..

What exactly are those issues, if I may ask? I see you're using bitcode with serde which seems to have issues for the time being. Not sure if that is related, though. It might be that some serde_json types cause problems according to the wiki.

I merged this PR alongside my "Styled Lines" PR: #1210 in a branch of Rnote I personally use, however opening previous "versions" of my files serialized using bitcode yielded a "Error: invalid packing", I tried debugging this with a forked version of bitcode that uses https://crates.io/crates/serde_path_to_error, (forgot to save the output sorry) but didn't find a "fix".

The bigger problem is that from what I can tell, bitcode is sadly but understandably not as forward-compatible as JSON. I still plan on keeping bitcode as a serialization option in this PR, for users willing to use rnote-cli and the method "lock", but JSON will probably go back to being the default.

@RayJW
Copy link
Contributor

RayJW commented Oct 15, 2024

Getting some weird issues with bitcode relating to the packing that aren't present with json, might have to reconsider using bitcode sadly..

What exactly are those issues, if I may ask? I see you're using bitcode with serde which seems to have issues for the time being. Not sure if that is related, though. It might be that some serde_json types cause problems according to the wiki.

I merged this PR alongside my "Styled Lines" PR: #1210 in a branch of Rnote I personally use, however opening previous "versions" of my files serialized using bitcode yielded a "Error: invalid packing", I tried debugging this with a forked version of bitcode that uses crates.io/crates/serde_path_to_error, (forgot to save the output sorry) but didn't find a "fix".

The bigger problem is that from what I can tell, bitcode is sadly but understandably not as forward-compatible as JSON. I still plan on keeping bitcode as a serialization option in this PR, for users willing to use rnote-cli and the method "lock", but JSON will probably go back to being the default.

That's disappointing but also very understandable. Thank you for working on this either way! What about the other options you explored, like Bincode or Postcard? I personally would prefer those over JSON because I don't really care about disk space at all, but the speed improvements are very interesting for the daily workflow. Because breaking forward compatibility would kinda suck.

@anesthetice
Copy link
Contributor Author

I merged this PR alongside my "Styled Lines" PR: #1210 in a branch of Rnote I personally use, however opening previous "versions" of my files serialized using bitcode yielded a "Error: invalid packing", I tried debugging this with a forked version of bitcode that uses crates.io/crates/serde_path_to_error, (forgot to save the output sorry) but didn't find a "fix".
The bigger problem is that from what I can tell, bitcode is sadly but understandably not as forward-compatible as JSON. I still plan on keeping bitcode as a serialization option in this PR, for users willing to use rnote-cli and the method "lock", but JSON will probably go back to being the default.

That's disappointing but also very understandable. Thank you for working on this either way! What about the other options you explored, like Bincode or Postcard? I personally would prefer those over JSON because I don't really care about disk space at all, but the speed improvements are very interesting for the daily workflow. Because breaking forward compatibility would kinda suck.

That's a good point, I personally want the absolute smallest file size since everything is backed up to a git repository, but people who don't have such constraints would probably prefer using whatever is fastest. I'll try to come up with a way of benchmarking and visualizing this (serialized + compressed ratio versus speed). Having more than one possible serialization method for users to choose from does complicate things, should multiple serialization methods, including non-forward compatible ones be exposed to everyday users? Or do we keep some only accessible via a method_lock and using rnote-cli? I would appreciate to also hear your thoughts on this @flxzt and @Doublonmousse

@finnbear
Copy link

using bitcode with serde which seems to have issues for the time being.

Just so you know, the linked issue would allow you to call bitcode::{encode, decode} on a struct where one field only implements serde traits.

This does not affect what you're using, bitcode::{serialize, deserialize}, at all :)

The bigger problem is that from what I can tell, bitcode is sadly but understandably not as forward-compatible as JSON

bitcode is not self-describing, which has forwards-compatibility implications depending on your schema. It also doesn't support all serde features.

@anesthetice
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New file format (stabilization) More resilient file saving by a two-stage file save process
3 participants