The audiobook produced may put many anchors in the Himalayas out of work! Support multiple languages.
I haven’t seen the project code or test address yet, only the paper
abstract
We introduced Seed-TTS, a series of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is almost indistinguishable from human speech.
As the basic model of speech generation, Seed-TTS performs well in speech context learning, and its performance in terms of speaker similarity and naturalness matches real human speech in objective and subjective assessments.
Through fine-tuning, we achieved higher subjective scores on these indicators.
Seed-TTS provides excellent controllability over various speech attributes, such as emotions, and is capable of generating highly expressive and diverse speech for speakers in natural environments.
In addition, we propose a self-distillation method for speech factorization, which improves performance by letting the model learn and improve on its own, and a reinforcement learning method that enhances model robustness, speaker similarity and controllability.
We also showed a non-autoregressive (NAR) variant of the Seed-TTS model called Seed-TTSDiT, which uses a completely diffusion-based architecture.
Unlike previous NAR-based TTS systems, Seed-TTSDiT does not rely on estimated phoneme durations, but instead generates speech through end-to-end processing.
We demonstrated that this variant achieves comparable performance to language model-based variants in objective and subjective assessments, and demonstrated its effectiveness in speech editing.
At the same time, it supports voice conversion between different languages and helps cross-language communication and communication.
Seed-TTS performed well in multiple experiments, and the speech it generated was close to human speech in naturalness and speaker similarity.
If you want to learn more, you can click on the link below the video.
Thank you for watching this video. If you like it, please subscribe and like it. thank
Thesis:
https://bytedancespeech.github.io/seedtts_tech_report/#applications-samples
Oil tubing: