Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omni-Lingual Quantizer? #42

Open
Many0therFunctions opened this issue Oct 18, 2023 · 2 comments
Open

Omni-Lingual Quantizer? #42

Many0therFunctions opened this issue Oct 18, 2023 · 2 comments

Comments

@Many0therFunctions
Copy link

(Took me way too long to realize this, and it just goes to show that most of us are just point and click type of fellas who don't really understand what we're using - not really a skiddie because we can code, but.. .you get what I mean)

So if the whole point of using bark-generated audio training a quantizer like this,

instead of simply grabbing a massive good dataset of audio and having whisper transcribe and then adding in tags or correcting as needed (or god forbid, manually finding voice clips with actual good audio that matches more or less what you hear)

is simply because you don't know how exactly they trained their hubert voice features to semantic tokens mapping, the unknown here being the semantic tokens, and you want to make sure you at least start off from THEIR hubert voice features to tokens mapping and refine it,

... then couldn't a general-purpose hubert to semantic tokens quantizer be made instead? You would just generate or aggregate all the supported languages, generate datasets if you don't have them already, train a quantizer on ALL OF THAT since its aim is just a reverse "tell me the semantic tokens for this series of sounds" and it should theoretically cover any "known" language

(minus the african ones because there's no statistically significant presence of tongue-click languages on the internet, but knowing bark and its random noises during generation, assuming hubert model has also learned that, it probably CAN map a tongue-click language too)

I see you did it for english, but I'm wondering why everyone has stopped at a single language quantizer when it can probably be made into an omnilingual quantizer.

I ask in the name of languages like Klingon, Middle English, Old English, Vietnamese with a southern accent....

@gitmylo
Copy link
Owner

gitmylo commented Oct 18, 2023

It's planned, but I'll need some datasets to combine first.

@KPY7030P
Copy link

KPY7030P commented Jun 4, 2024

I fully support this approach! I don't know much about this technology, but I think it's the best universal method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants