v0.4.0 Experimental DeepSpeed and multi-node CPU support
v0.4.0 Experimental DeepSpeed support
This release adds support for DeepSpeed. While the basics are there to support ZeRO-2, ZeRo-3, as well a CPU and NVME offload, the API might evolve a little bit as we polish it in the near future.
It also adds support for multi-node CPU. In both cases, just filling the questionnaire outputted by accelerate config
and then launching your script with accelerate launch
is enough, there are no changes in the main API.
DeepSpeed support
Multinode CPU support
Various fixes
- Fix batch_sampler error for IterableDataset #62 (@ddkalamk)
- Honor namedtuples in inputs/outputs #67 (@sgugger)
- Fix examples README #70 (@cccntu)
- TPU not available in kaggle #73 (@yuangan)
- Pass args in notebook_launcher for multi-GPU #78 (@sgugger)
- Fix
accelerate test
with no config file #79 (@cccntu) - Use
optimizer
for consistency #81 (@kumapo) - Update README.md #87 (@Separius)
- Add
unscale_gradients
method. #88 (@sgugger) - Add Accelerator.free_memory #89 (@sgugger)
- [Feature] Add context manager to allow main process first. #98 (@Guillem96)
- Pass along kwargs to backward #104 (@sgugger)
- Add course banner #107 (@sgugger)
- added closure argument to optimizer.step() #105 (@pmelchior)
- Fix import error for torch 1.4.0 #108 (@sgugger)
- Unwrap optimizer before unscaling #115 (@sgugger)
- Fix DataLoader length when split_batches=True #121 (@sgugger)
- Fix
OptimWrapper
init #127 (@sgugger) - Fix fp16 by converting outputs back to FP32 #134 (@sgugger)
- Add caveat on weight-tying on TPUs #138 (@sgugger)
- Add optimizer not stepped property #139 (@sgugger)