Fine-tuning process of VITs model is very slow on A100. #4053
-
Hello everyone, I'm trying to fine-tune a VITs model with a dataset consisting of approximately 450 audio samples, totaling about 30 minutes of voice data (in LJspeech format, each clip is between 3 to 10 seconds). I am using an A100 GPU on Google Colab Pro. Issue Observations Batch Size: 128 The full configuration is below
Attempts to Fix the Issue What I'm Looking For Optimal configuration for batch_size, learning rate, or any other parameters to take full advantage of an A100. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Note that in the linked discussion they are talking about steps, not epochs, so it's not really comparable. For convergence you need to listen to the audio rather than counting steps or watching losses. |
Beta Was this translation helpful? Give feedback.
Note that in the linked discussion they are talking about steps, not epochs, so it's not really comparable. For convergence you need to listen to the audio rather than counting steps or watching losses.