Skip to content

AWS S3 Inventory-based backup tool with efficient incremental & versionId support

License

Notifications You must be signed in to change notification settings

dandi/s3invsync

Repository files navigation

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. CI Status codecov.io Minimum Supported Rust Version MIT License

GitHub | Issues | Changelog

s3invsync is a Rust program for creating & syncing backups of an AWS S3 bucket (including old versions of objects) by making use of the bucket's Amazon S3 Inventory files.

Warning: This is an in-development program. They may be bugs, and some planned features have not been implemented yet.

Building & Running

  1. Install Rust and Cargo.

  2. Clone this repository and cd into it.

  3. Run cargo build --release to build the binary. The intermediate build artifacts will be cached in target/ in order to speed up subsequent builds.

  4. Run with cargo run --release -- <arguments ...>.

  5. If necessary, the actual binary can be found in target/release/s3invsync. It should run on any system with the same OS and architecture as it was built on.

Usage

cargo run --release -- [<options>] <inventory-base> <outdir>

s3invsync downloads the contents of an S3 bucket, including old versions of objects, to the directory <outdir> using S3 Inventory files located at <inventory-base>.

<inventory-base> must be of the form s3://{bucket}/{prefix}/, where {bucket} is the destination bucket on which the inventory files are stored and {prefix}/ is the key prefix under which the inventory manifest files are located in the bucket (i.e., appending a string of the form YYYY-MM-DDTHH-MMZ/manifest.json to {prefix}/ should yield a key for a manifest file).

s3invsync honors AWS credentials stored in the standard locations (e.g., the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION or the default credentials files ~/.aws/config and ~/.aws/credentials). For public buckets, no credentials need to be provided.

When downloading a given key from S3, the latest version (if not deleted) is stored at {outdir}/{key}, and the versionIds and etags of all latest object versions in a given directory are stored in .s3invsync.versions.json in that directory. Each non-latest, non-deleted version of a given key is stored at {outdir}/{key}.old.{versionId}.{etag}.

Options

  • -d <DATE>, --date <DATE> — Download objects from the inventory created at the given date.

    By default, the most recent inventory is downloaded.

    The date must be in the format YYYY-MM-DD (in which case the latest inventory for the given date is used) or in the format YYYY-MM-DDTHH-MMZ (to specify a specific inventory).

  • -I <INT>, --inventory-jobs <INT> — Specify the maximum number of inventory list files to download & process at once [default: 20]

  • -l <level>, --log-level <level> — Set the log level to the given value. Possible values are "ERROR", "WARN", "INFO", "DEBUG", and "TRACE" (all case-insensitive). [default value: DEBUG]

  • -O <INT>, --object-jobs <INT> — Specify the maximum number of inventory entries to download & process at once [default: 20]

  • --path-filter <REGEX> — Only download objects whose keys match the given regular expression

  • --trace-progress — Emit download progress information at the TRACE level. This is off by default because it can make for some very noisy logs.