STT Extension #4847

ClashSAN · 2022-11-19T06:01:39Z

ClashSAN
Nov 19, 2022
Collaborator

I wanted to share something that allows you to prompt by speaking out loud.
(it's not mine, it's by @152334H )

https://github.com/152334H/sd-webui-whisper

working great, though I wasn't able to switch to using a larger model (for better quality)
-According to @152334H it has many bugs, which I do not see. You may need more than 4gb vram if you want to run the whisper model accelerated on your gpu.

I don't think it has always-listen mode, and since I always wanted a feature like that, I used @mallorbc https://github.com/mallorbc/whisper_mic with the api demo script @Kilvoctu made to constantly listen and generate to output.png.

152334H · 2022-11-19T06:38:00Z

152334H
Nov 19, 2022

Thanks for sharing.

wasn't able to switch model

The model type is hardcoded here:

https://github.com/152334H/sd-webui-whisper/blob/master/main.py#L18-L19

Could make a UI dropdown option for it.

I don't think it has always-listen mode

It kind of does but also not really. Ticking the checkbox will make the extension listen for a voice instruction ("computer") to start prompting. This is also the most buggy part of the repo :)

I encountered two main problems in attempting to implement always-listen:

Difficulty of adding continuous audio streaming to gradio, especially within an extension. The nasty solution for this is to record audio directly from python (as I do in the code), but this will not work well for people who are using webui remotely.
Difficulty of running voice recognition asynchronously in general. The asynchronous process needs to send text to the prompt box. This is difficult to accomplish without editing the main webui source (e.g. to enable queued tasks). The implementation I made runs synchronously so that it can execute js on the client to copy the inferred text to the prompt box.

Feel free to send a PR (or just fork the extension) if you have better code.

1 reply

ClashSAN Nov 19, 2022
Collaborator Author

It could be alot of fun to use your extension when "always-listen" is active, and stream the outputs to a TV or monitor setup.

Should this be added to the extensions index or are you not finished with it?

Kilvoctu · 2022-11-19T06:52:15Z

Kilvoctu
Nov 19, 2022

I appreciate the shoutout 😀. I'm always feeling like I should improve that API documentation and demo script, but not sure what to add.
Still, it makes me very happy that it's helped people.

1 reply

ClashSAN Nov 19, 2022
Collaborator Author

yeah, it broke and a day later you updated the guide, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STT Extension #4847

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

STT Extension #4847

ClashSAN Nov 19, 2022 Collaborator

Replies: 2 comments · 2 replies

152334H Nov 19, 2022

ClashSAN Nov 19, 2022 Collaborator Author

Kilvoctu Nov 19, 2022

ClashSAN Nov 19, 2022 Collaborator Author

ClashSAN
Nov 19, 2022
Collaborator

Replies: 2 comments 2 replies

152334H
Nov 19, 2022

ClashSAN Nov 19, 2022
Collaborator Author

Kilvoctu
Nov 19, 2022

ClashSAN Nov 19, 2022
Collaborator Author