-
-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subtitle timing and synchronization issue #396
Comments
Thanks for uploading the sample! I'll test & try to find out what the problem is, and what could be better. +) The first halluication part is 18:27 ~ 19:21 |
I will try both v2 and v3 and get back to you, thanks. |
Sorry but the result is still disappointing, maybe it may be necessary to use it in different settings. It is ideal for short 5-6 minute content, but it is not ideal for movies or documentaries right now. I am adding both files, both v2 and v3 synchronization problem continues and it seems like it started to get really ridiculous in the end :( |
That's too different result than mine, would you copy + paste this into default_parameters.yaml and try again?
|
whisperx
I will try. You may consider the whisperx integration. Last night I saw that creating subtitles with whisperx was much more successful. |
hi @jhj0517 I tried with your settings but the result is the same I tried with large-v2 and large-v3. I tried other settings (remove background music, voice detection filter, advanced settings etc.) but there was no satisfactory improvement. 1 it sounds like this but the first conversation starts in 10 seconds. Also the sentence is long. This is like this in many places. I made a sample project with whisperx the result was like below, it really gave better results 1 2 |
Both whisperX and Whisper-WebUI use The reason why it gets better results is probably because it uses a different implementation of the VAD. So I noted it here now. |
Hello, I am experiencing some issues while generating subtitles for the video attached below. Despite trying various values in the Advanced Parameters and Voice Detection sections, I am not able to achieve the desired results.
For example, I keep testing, but the text either appears before or after the audio, or the words are too long. Sometimes, very simple two-word subtitles stay on the screen for 30 seconds. Occasionally, there are 2 or 3 different languages in the uploaded file, and in such cases, the behavior changes as well.
I have enabled background music removal, activated VAD, and tested with the large v2 and v3 versions. I increased the Best of and Beam Size values up to 30. I tried many parameters with the sample file I provided, but I still didn’t get the exact results I wanted. What parameters should I use? There is a link to the sample file and subtitles. Are there any settings you would recommend?
https://easyupload.io/d1w4fi (file)
https://easyupload.io/0p558m (srt)
The text was updated successfully, but these errors were encountered: