Batch & Gradient Checkpoint Setting? Also epochs? #879

animeking527 · 2022-12-05T01:10:34Z

animeking527
Dec 5, 2022

It may be worth adding an option to change the batch & gradient accumulation.

I've been doing some tests & bs=10 & gradient_accumulation_steps=6 is providing 11.6s/it.
To compare that to the default of bs & gc=1 providing 1.3s/it.

It's a considerable speed increase especially when trying to reach larger training steps (10k steps=120 mins vs 217 mins)

Even with 704 resolution I can do bs=4, gc=5 and get 10s/it.

I can't really say if the results are better or worse, just different. I'm sure there's more optimal settings but so far I haven't seen horrendous issues with the quality of the outputs.

All this has made me wonder though, should we be using epochs (denoting the processing of they every image in the dataset once) when discussing training? Saying 100 training steps per image sounds like 100 epochs (each image processed 100 times) but that's not that case correct?

TheLastBen · 2022-12-05T07:54:58Z

TheLastBen
Dec 5, 2022
Maintainer

200 training steps per image is just a reference to get an idea about total steps.
How much VRAM does bs=4 take ?

1 reply

nawnie Dec 11, 2022

3 works for sure i can update my pull request which has all this tonight to your lastest updates, im pretty sure 4 works in 1.5

note your speed is still about .5s/i per image which is same as batch 2 @512 havnt played with it on 7 yet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch & Gradient Checkpoint Setting? Also epochs? #879

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

Select a reply

Batch & Gradient Checkpoint Setting? Also epochs? #879

animeking527 Dec 5, 2022

Replies: 2 comments · 1 reply

TheLastBen Dec 5, 2022 Maintainer

nawnie Dec 11, 2022

animeking527
Dec 5, 2022

Replies: 2 comments 1 reply

TheLastBen
Dec 5, 2022
Maintainer