My First Release of an Audiobook using Coqui-TTS #2635
Replies: 7 comments 8 replies
-
You can eliminate the inconsistent voice by fixing a seed for the normalising flow modules that sample from a Gaussian. Also, turning off sdp and using DDP helps to somewhat fix phoneme length duration in the audio relm. |
Beta Was this translation helpful? Give feedback.
-
the audio book is great. totk rocks! |
Beta Was this translation helpful? Give feedback.
-
In other TTS Systems that I've played with to make similar audiobooks, I generate sentence-length wavs, then use a silent, .25 second long wav that I created in audacity in between when I merge all the generated wavs. I know it can be done in ffmpeg, but I use sox to merge the files. The call is super simple.
|
Beta Was this translation helpful? Give feedback.
-
This is great, I came to the discussion page hoping to ask if anyone was able to use TTS for audio book creation. Specifically I am wondering if it would be possible to convert an existing audio book into one read by David Attenborough just for fun of course. |
Beta Was this translation helpful? Give feedback.
-
Amazing!! Could you explain me how is your workflow for this book? I have a .txt that with balabolka is converted in a wav of 15 hours, its book. But i dont know how can use coqui-ai for send my txt file, i simply cant paste the file text. Could you help me, please. I understand a few of python. |
Beta Was this translation helpful? Give feedback.
-
Are there any repositories, that I can clone, and just use via the command line? Just point it a plaintext file. |
Beta Was this translation helpful? Give feedback.
-
I am so sorry to ask such a basic question, but how did you set this all up? I want to similarly have narration on my writing and I'm lost trying to navigate the technical space surrounding these AI TTS programs. Any help at all would be greatly appreciated. I have an NVIDIA GTX 3060 with 16 GB of RAM. |
Beta Was this translation helpful? Give feedback.
-
Watch/Listen here! https://www.youtube.com/playlist?list=PL-vT9uPaqMCksJSOGOocVokVk7G2s0upW
The Princess who Carries the Blood of the Goddess, by TheLoudGuy. Basically, a Breath of the Wild Novel if Link and Zelda's fates had been reversed.
I do this sort of thing as a hobby, and for years I used Microsoft Speech Platform's Microsoft Zira, along with Balabolka, for this purpose. I really appreciated the consistency I could achieve with that voice. But, it was robotic enough that it put a lot of people off, too. As such, I was always on the look out for other potential voices I could use. Nothing really seemed like much of an improvement, until I messed with Coqui-TTS enough to really hear the potential.
I've spent a few weeks tweaking my program to take an html document, turn it into a formatted text document, and then generate a full chapter separated series of Mp3s for the whole thing. On top of that, I changes the pitch of dialogue so you know when someone is talking, and it emphasizes italics text with a slight speed decrease.
I have to say, I learned a lot trying to do this audiobook. Finding ways to get around the voice's little errors. The voice is awesome though. The big trade-off is consistency. If you gave Zira a word, it would pronounce it the same, always. Here, it's almost always the same, but sometimes not.
I'll tweak it more in the future. I need to add in some more functionality around word replacements. I want to see just how accurate I can get it. And also find a way to generate silences better.
Anyway, I wanted to show off a bit, so - enjoy, I hope!
EDIT: oh right, I used VITS with speaker p248.
Beta Was this translation helpful? Give feedback.
All reactions