Release v0.32.0: Profilers, new hooks, speedups, and more! · huggingface/accelerate

Core

Utilize shard saving from the huggingface_hub rather than our own implementation (#2795)
Refactor logging to use logger in dispatch_model (#2855)
The Accelerator.step number is now restored when using save_state and load_state (#2765)
A new profiler has been added allowing users to collect performance metrics during model training and inference, including detailed analysis of execution time and memory consumption. These can then be generated in Chrome's tracing tool. Read more about it here (#2883)
Reduced import times for doing import accelerate and any other major core import by 68%, now should be only slightly longer than doing import torch (#2845)
Fixed a bug in get_backend and added a clear_device_cache utility (#2857)

Distributed Data Parallelism

Introduce DDP communication hooks to have more flexibility in how gradients are communicated across workers, overriding the standard allreduce. (#2841)
Make log_line_prefix_template optional the notebook_launcher (#2888)

FSDP

If the output directory doesn't exist when using accelerate merge-weights, one will be automatically created (#2854)
When merging weights, the default is now .safetensors (#2853)

XPU

Migrate to pytorch's native XPU backend on torch>=2.4 (#2825)
Add @require_triton test decorator and enable test_dynamo work on xpu (#2878)
Fixed load_state_dict not working on xpu and refine xpu safetensors version check (#2879)

XLA

Added support for XLA Dynamo backends for both training and inference (#2892)

Examples

Added a new multi-cpu SLURM example using accelerate launch (#2902)

Full Changelog

Use shard saving from huggingface_hub by @SunMarc in #2795
doc: fix link by @imba-tjd in #2844
Revert "Slight rename" by @SunMarc in #2850
remove warning hook addede during dispatch_model by @SunMarc in #2843
Remove underlines between badges by @novialriptide in #2851
Auto create dir when merging FSDP weights by @helloworld1 in #2854
Add DDP Communication Hooks by @yhna940 in #2841
Refactor logging to use logger in dispatch_model by @panjd123 in #2855
xpu: support xpu backend from stock pytorch (>=2.4) by @dvrogozh in #2825
Drop torch re-imports in npu and mlu paths by @dvrogozh in #2856
Default FSDP weights merge to safetensors by @helloworld1 in #2853
[tests] fix bug in test_tracking.ClearMLTest by @faaany in #2863
[tests] use torch_device instead of 0 for device check by @faaany in #2861
[tests] skip bnb-related tests instead of failing on xpu by @faaany in #2860
Potentially fix tests by @muellerzr in #2862
[tests] enable XPU backend for test_zero3_integration by @faaany in #2864
Support saving and loading of step while saving and loading state by @bipinKrishnan in #2765
Add Profiler Support for Performance Analysis by @yhna940 in #2883
Speed up imports and add a CI by @muellerzr in #2845
Make log_line_prefix_template Optional in Elastic Launcher for Backward Compatibility by @yhna940 in #2888
Add XLA Dynamo backends for training and inference by @johnsutor in #2892
Added a MultiCPU SLURM example using Accelerate Launch and MPIRun by @okhleif-IL in #2902
make more cuda-only tests device-agnostic by @faaany in #2876
fix mlu device longTensor bugs by @huismiling in #2887
add require_triton and enable test_dynamo work on xpu by @faaany in #2878
fix load_state_dict for xpu and refine xpu safetensor version check by @faaany in #2879
Fix get_backend bug and add clear_device_cache function by @NurmaU in #2857

New Contributors

@McPatate made their first contribution in #2836
@imba-tjd made their first contribution in #2844
@novialriptide made their first contribution in #2851
@panjd123 made their first contribution in #2855
@dvrogozh made their first contribution in #2825
@johnsutor made their first contribution in #2892
@okhleif-IL made their first contribution in #2902
@NurmaU made their first contribution in #2857

Full Changelog: v0.31.0...v0.32.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.32.0: Profilers, new hooks, speedups, and more!

Core

Distributed Data Parallelism

FSDP

XPU

XLA

Examples

Full Changelog

New Contributors

Contributors