Batch & Gradient Checkpoint Setting? Also epochs? #879
animeking527
started this conversation in
Ideas
Replies: 2 comments 1 reply
-
200 training steps per image is just a reference to get an idea about total steps. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It may be worth adding an option to change the batch & gradient accumulation.
I've been doing some tests & bs=10 & gradient_accumulation_steps=6 is providing 11.6s/it.
To compare that to the default of bs & gc=1 providing 1.3s/it.
It's a considerable speed increase especially when trying to reach larger training steps (10k steps=120 mins vs 217 mins)
Even with 704 resolution I can do bs=4, gc=5 and get 10s/it.
I can't really say if the results are better or worse, just different. I'm sure there's more optimal settings but so far I haven't seen horrendous issues with the quality of the outputs.
All this has made me wonder though, should we be using epochs (denoting the processing of they every image in the dataset once) when discussing training? Saying 100 training steps per image sounds like 100 epochs (each image processed 100 times) but that's not that case correct?
Beta Was this translation helpful? Give feedback.
All reactions