27 Sep 15:05

sgugger

19ec4a7

v0.5.1: Patch release

Fix the two following bugs:

convert_to_fp32 returned booleans instead of tensors #173
wrong dataloader lenght when dispatch_batches=True #175

Assets 2

23 Sep 14:38

sgugger

v0.5.0

56d8760

v0.5.0 Dispatch batches from main DataLoader

This release introduces support for iterating through a DataLoader only on the main process, that then dispatches the batches to all processes.

Dispatch batches from main DataLoader

The motivation behind this come from dataset streaming which introduces two difficulties:

there might be some timeouts for some elements of the dataset, which might then be different in each process launched, thus it's impossible to make sure the data is iterated though the same way on each process
when using IterableDataset, each process goes through the dataset, thus applies the preprocessing on all elements. This can yield to the training being slowed down by this preprocessing.

This new feature is activated by default for all IterableDataset.

Central dataloader #164 (@sgugger)
Dynamic default for dispatch_batches #168 (@sgugger)

Various fixes

fix fp16 covert back to fp32 for issue: unsupported operand type(s) for /: 'dict' and 'int' #149 (@Doragd)
[Docs] Machine config is yaml not json #151 (@patrickvonplaten)
Fix gather for 0d tensor #152 (@sgugger)
[DeepSpeed] allow untested optimizers deepspeed #150 (@patrickvonplaten)
Raise errors instead of warnings with better tests #170 (@sgugger)

Contributors

patrickvonplaten, Doragd, and sgugger

Assets 2

10 Aug 09:46

sgugger

v0.4.0

95c4676

v0.4.0 Experimental DeepSpeed and multi-node CPU support

v0.4.0 Experimental DeepSpeed support

This release adds support for DeepSpeed. While the basics are there to support ZeRO-2, ZeRo-3, as well a CPU and NVME offload, the API might evolve a little bit as we polish it in the near future.

It also adds support for multi-node CPU. In both cases, just filling the questionnaire outputted by accelerate config and then launching your script with accelerate launch is enough, there are no changes in the main API.

DeepSpeed support

Add DeepSpeed support #82 (@vasudevgupta7)
DeepSpeed documentation #140 (@sgugger)

Multinode CPU support

Add distributed multi-node cpu only support (MULTI_CPU) #63 (@ddkalamk)

Various fixes

Fix batch_sampler error for IterableDataset #62 (@ddkalamk)
Honor namedtuples in inputs/outputs #67 (@sgugger)
Fix examples README #70 (@cccntu)
TPU not available in kaggle #73 (@yuangan)
Pass args in notebook_launcher for multi-GPU #78 (@sgugger)
Fix accelerate test with no config file #79 (@cccntu)
Use optimizer for consistency #81 (@kumapo)
Update README.md #87 (@Separius)
Add unscale_gradients method. #88 (@sgugger)
Add Accelerator.free_memory #89 (@sgugger)
[Feature] Add context manager to allow main process first. #98 (@Guillem96)
Pass along kwargs to backward #104 (@sgugger)
Add course banner #107 (@sgugger)
added closure argument to optimizer.step() #105 (@pmelchior)
Fix import error for torch 1.4.0 #108 (@sgugger)
Unwrap optimizer before unscaling #115 (@sgugger)
Fix DataLoader length when split_batches=True #121 (@sgugger)
Fix OptimWrapper init #127 (@sgugger)
Fix fp16 by converting outputs back to FP32 #134 (@sgugger)
Add caveat on weight-tying on TPUs #138 (@sgugger)
Add optimizer not stepped property #139 (@sgugger)

Contributors

kumapo, Separius, and 7 other contributors

Assets 2

29 Apr 15:45

sgugger

v0.3.0

dd9f7aa

v0.3.0 Notebook launcher and multi-node training

Notebook launcher

After doing all the data preprocessing in your notebook, you can launch your training loop using the new notebook_launcher functionality. This is especially useful for Colab or Kaggle with TPUs! Here is an example on Colab (don't forget to select a TPU runtime).

This launcher also works if you have multiple GPUs on your machine. You just have to pass along num_processes=your_number_of_gpus in the call to notebook_launcher.

Notebook launcher #44 (@sgugger)
Add notebook/colab example #52 (@sgugger)
Support for multi-GPU in notebook_launcher #56 (@sgugger)

Multi-node training

Our multi-node training test setup was flawed and the previous releases of 🤗 Accelerate were not working for multi-node distributed training. This is all fixed now and we have ensured to have more robust tests!

fix cluster.py indent error #35 (@JTT94)
Set all defaults from config in launcher #38 (@sgugger)
Fix port in config creation #50 (@sgugger)

Various bug fixes

Fix typos in examples README #28 (@arjunchandra)
Fix load from config #31 (@sgugger)
docs: minor spelling tweaks #33 (@brettkoonce)
Add set_to_none to AcceleratedOptimizer.zero_grad #43 (@sgugger)
fix #53 #54 (@Guitaricet)
update launch.py #58 (@Jesse1eung)

Assets 2

19 Apr 17:32

sgugger

v0.2.1

a70ba9d

v0.2.1: Patch release

Fix a bug preventing the load of a config with accelerate launch

Assets 2

15 Apr 16:00

sgugger

v0.2.0

499a5e5

v0.2.0 SageMaker launcher

SageMaker launcher

It's now possible to launch your training script on AWS instances using SageMaker via accelerate launch.

Launch script on SageMaker #26 (@philschmid )
Add defaults for compute_environmnent #23 (@sgugger )
Add Configuration setup for SageMaker #17 (@philschmid )

Kwargs handlers

To customize how the different objects used for mixed precision or distributed training are instantiated, a new API called KwargsHandler is added. This allows the user to pass along the kwargs that will be passed to those objects if used (and it is ignored if those are not used in the current setup, so the script can still run on any kind of setup).

Add KwargsHandlers #15 (@sgugger )

Pad across processes

Trying to gather tensors that are not of the same size across processes resulted in a process hang, a new method Accelerator.pad_across_processes has been added to help with that.

Add utility to pad tensor across processes to max length #19 (@sgugger )

Various bug fixes

added thumbnail #25 (@philschmid )
Cleaner diffs in README and index #22 (@sgugger )
Use proper size #21 (@sgugger )
Alternate diff #20 (@sgugger )
Add YAML config support #16 (@sgugger )
Don't error on non-Tensors objects in move to device #13 (@sgugger )
Add CV example #10 (@sgugger )
Readme clean-up #9 (@thomwolf )
More flexible RNG synchronization #8 (@sgugger )
Fix typos and tighten grammar in README #7 (@lewtun )
Update README.md #6 (@voidful )
Fix TPU training in example #4 (@thomwolf )
Fix example name in README #3 (@LysandreJik )

Assets 2

05 Mar 21:59

sgugger

v0.1.0

0fbbbc5

v0.1.0 Initial release

Initial release of 🤗 Accelerate. Checkout the main README or the docs to learn more about it!

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.1: Patch release

v0.5.0 Dispatch batches from main DataLoader

Dispatch batches from main DataLoader

Various fixes

Contributors

v0.4.0 Experimental DeepSpeed support

DeepSpeed support

Multinode CPU support

Various fixes

Contributors

v0.3.0 Notebook launcher and multi-node training

Notebook launcher

Multi-node training

Various bug fixes

v0.2.0 SageMaker launcher

SageMaker launcher

Kwargs handlers

Pad across processes

Various bug fixes

Releases: huggingface/accelerate

v0.5.1: Patch release

v0.5.1: Patch release

v0.5.0 Dispatch batches from main DataLoader

v0.5.0 Dispatch batches from main DataLoader

Dispatch batches from main DataLoader

Various fixes

Contributors

v0.4.0 Experimental DeepSpeed and multi-node CPU support

v0.4.0 Experimental DeepSpeed support

DeepSpeed support

Multinode CPU support

Various fixes

Contributors

v0.3.0 Notebook launcher and multi-node training

v0.3.0 Notebook launcher and multi-node training

Notebook launcher

Multi-node training

Various bug fixes

v0.2.1: Patch release

v0.2.0 SageMaker launcher

v0.2.0 SageMaker launcher

SageMaker launcher

Kwargs handlers

Pad across processes

Various bug fixes

v0.1.0 Initial release