Precise Use of Actual Subtitles #323

iodides · 2024-10-07T06:39:42Z

First of all, I want to express my thanks because I'm using it very well.

In general, when you have a script for recorded videos, movies, or music, there is often a fully accurate script available. However, when using Whisper WebUI to convert speech to text, it often doesn't recognize certain words and sentences perfectly, so manual correction is required.

It's difficult, even for AI or humans, to completely understand dialogue just by listening. Therefore, it would be great if, when there is an original script, we could upload a script file (without timestamps) alongside the audio, and the AI could recognize and synchronize the original subtitles with the correct timing.

jhj0517 · 2024-10-07T09:47:12Z

Hi. If I understand correctly, you want to let the web ui only update "timestamps" with transcription?
I'm considering if I should implement this or not and how I should implement this.

And if the hallucination is problem, you can consider using VAD ( Voice Detection ) and BGM Separation filters from the WebUI.

They will feed the cleaner audio to the whisper and most of the hallucinations will disappear just by removing the noise from the audio.

iodides · 2024-10-08T05:31:00Z

Yes, I have an original script for the video. Of course, the recognition result from Whisper is excellent, but the results are not 100% accurate.

For example, if the original script is:
Lost in the maze of broken streets, Twelve paths ahead, where will I meet,

the result from WebUI comes out as:

🎵 Lost in a maze of broken strings 🎵 🎵 To a path ahead, where will I meet? 🎵

So, I have to compare line by line and correct the text.

iodides · 2024-10-08T05:47:39Z

another samples,
Original script:
Even if fate decides to blind,
I’ll walk the path, leave doubt behind.
With every turn, I feel you near,
Summer’s light will reappear.

Webui Result:
Even if fate decides to bind
I'll walk the path leaped out behind
With every turn, I fear you're near
Sunrise light will reappear

in my case, it's a music

jhj0517 · 2024-10-09T04:05:01Z

Transcribing music is a really good example of using the Background Music Remover filter in the WebUI.
If you haven't tried it yet, I recommend to use it.

Original script: leave doubt behind.
Webui Result: leaped out behind

This kind of case seems difficult one. You might consider to use higher beam_size ( Which exists in the Advanced Parameters" tab), like might 10. Higher beam_size slows down the transcription, but makes it more accurate.

As for the feature itself, I see this as a very specific one, I will implement it if others want it as well!

iodides · 2024-10-11T13:13:05Z

Sample Music.zip

Goodbye.flac : Sample Muisic
Goodbye.txt : Original Lyrics
Goodbye_webui.srt : Webui Created SRT
Goodbye_manual.srt : I created manually

iodides added the enhancement New feature or request label Oct 7, 2024

iodides assigned jhj0517 Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precise Use of Actual Subtitles #323

Precise Use of Actual Subtitles #323

iodides commented Oct 7, 2024

jhj0517 commented Oct 7, 2024 •

edited

Loading

iodides commented Oct 8, 2024

iodides commented Oct 8, 2024

jhj0517 commented Oct 9, 2024

iodides commented Oct 11, 2024

Precise Use of Actual Subtitles #323

Precise Use of Actual Subtitles #323

Comments

iodides commented Oct 7, 2024

jhj0517 commented Oct 7, 2024 • edited Loading

iodides commented Oct 8, 2024

iodides commented Oct 8, 2024

jhj0517 commented Oct 9, 2024

iodides commented Oct 11, 2024

jhj0517 commented Oct 7, 2024 •

edited

Loading