Does this work with the unseen speech? #10

sbkim052 · 2020-07-20T04:25:25Z

Thank you for sharing your git.

My question is same above.
Does this work with the unseen speech?

bshall · 2020-07-20T08:09:59Z

Yeah, it should work with unseen speech as the input. All the examples here are converted from unseen speech.

If you want to convert to an unseen speaker, you'd have to retrain the model. You could also look into conditioning on x-vectors or other speaker embeddings if you want to do zero-shot conversion.

sbkim052 · 2020-07-21T04:04:58Z

Hi @sbkim052,

Yeah, it should work with unseen speech as the input. All the examples here are converted from unseen speech.

If you want to convert to an unseen speaker, you'd have to retrain the model. You could also look into conditioning on x-vectors or other speaker embeddings if you want to do zero-shot conversion.

Thank you for answering:)

I have an additional question about your answer.
What do you mean by conditioning on x-vectors or other embeddings for zero-shot conversion?
Could you explain it in more detail on doing the zero-shot conversion?

bshall · 2020-07-21T08:45:22Z

@sbkim052, no problem.

The basic idea is to train a speaker verification/classification model to learn an embedding space for speaker identity. Then, instead of conditioning the decoder on a fixed speaker id (like I did in this repo), you condition on the learned embeddings. At test time you can get the embedding for a new unseen speaker and condition the decoder to generate speech in that voice. For more info, you can take a look at this paper. They use a text-to-speech model instead of an autoencoder but the general idea is the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this work with the unseen speech? #10

Does this work with the unseen speech? #10

sbkim052 commented Jul 20, 2020

bshall commented Jul 20, 2020

sbkim052 commented Jul 21, 2020 •

edited

Loading

bshall commented Jul 21, 2020

Does this work with the unseen speech? #10

Does this work with the unseen speech? #10

Comments

sbkim052 commented Jul 20, 2020

bshall commented Jul 20, 2020

sbkim052 commented Jul 21, 2020 • edited Loading

bshall commented Jul 21, 2020

sbkim052 commented Jul 21, 2020 •

edited

Loading