Skip to content

v0.24.0: Improved Reproducability, Bug fixes, and other Small Improvements

Compare
Choose a tag to compare
@muellerzr muellerzr released this 24 Oct 17:37

Improved Reproducibility

One critical issue with Accelerate is training runs were different when using an iterable dataset, no matter what seeds were set. v0.24.0 introduces the dataloader.set_epoch() function to all Accelerate DataLoaders, where if the underlying dataset (or sampler) has the ability to set the epoch for reproducability it will do so. This is similar to the implementation already existing in transformers. To use:

dataloader = accelerator.prepare(dataloader)
# Say we want to resume at epoch/iteration 2
dataloader.set_epoch(2)

For more information see this PR, we will update the docs on a subsequent release with more information on this API.

Documentation

  • The quick tour docs have gotten a complete makeover thanks to @MKhalusova. Take a look here
  • We also now have documentation on how to perform multinode training, see the launch docs

Internal structure

  • Shared file systems are now supported under save and save_state via the ProjectConfiguration dataclass. See #1953 for more info.
  • FSDP can now be used for bfloat16 mixed precision via torch.autocast
  • all_gather_into_tensor is now used as the main gather operation, reducing memory in the cases of big tensors
  • Specifying drop_last=True will now properly have the desired affect when performing Accelerator().gather_for_metrics()

What's Changed

New Contributors

Full Changelog: v0.23.0...v0.24.0