Speech to text TTS model

It should be the best support for Chinese so far
ChatTTS: A text-to-speech TTS model specifically designed for conversation scenarios

The model took more than 100,000 hours of training, and the public version provides a 40,000-hour pre-training model on HuggingFace.
Designed for dialogue tasks, it can support multiple speaker voices, mixed Chinese and English, etc.
The model can also predict and control fine-grained prosodic features such as laughter, pauses, and interruptions, and can also make more fine-grained adjustments such as speed, tone, and emotion.

ChatTTS is a text-to-speech model specifically designed for conversation scenarios, such as LLM assistant conversation tasks. It supports both English and Chinese. The largest model uses more than 100,000 hours of Chinese and English data for training. The open source version in HuggingFace is a version with 40,000 hours of training without SFT.

highlights

Conversational TTS: ChatTTS is optimized for conversational tasks, achieves natural and smooth Text To Speech, and supports multiple speakers.
Fine-grained control: This model can predict and control fine-grained prosodic features, including laughter, pauses, and insertions.
Better rhythm: ChatTTS exceeds most open source TTS models in rhythm. Pre-trained models are also provided to support further research.

If you want to learn more, you can click on the link below the video.
Thank you for watching this video. If you like it, please subscribe and like it. thank

GitHub:https://github.com/2noise/ChatTTS

Oil tubing:

Scroll to Top