Amphion: is an open-source toolkit that enables voice, voice, and singing capabilities.

In addition to text-to-speech capabilities, it can also change the voice of one song with that of another singer. It also supports voice conversion, singing synthesis, text-to-audio, text-to-music, and more!

Very powerful!

Demo video: Taylor Swift singing a Chinese song 🎵

Amphion’s supported audio generation tasks span a wide range of domains, from text to music, each with its unique applications and technical requirements.

Key features:

1. Text-to-speech: Convert text into colloquial speech.
Applications: Used to make voice assistants, automatic voice reply systems, read texts for the visually impaired, etc.

2. Singing voice synthesis: Create a virtual singer’s voice, which can generate singing voices from text or melodies.
Application: Used for music production, virtual idol creation, etc.

3. Voice Transformation: Change one person’s voice to sound like another person.
Applications: Used for entertainment, sound design, anonymous communication, etc.

4. Singing voice conversion: Convert the voice of the singer of a song into the voice of another singer.
Applications: For music production, personalized music experiences, and more.

5. Text-to-audio: Not only convert text to speech, but also other types of audio, such as sound effects or music clips.
Applications: Used to create sound effects, music clips, audio stories, etc.

6. Text to Music: Generate music from text descriptions.
Applications: For automated music creation, creating music based on emotions or storylines, etc.

Model Support: The toolkit supports multiple models and architectures, such as FastSpeech2, VITS, Vall-E, NaturalSpeech2, and more, for different audio generation tasks.

Vocoder Support: Amphion supports a wide range of neural vocoders, including GAN-based vocoders (e.g., MelGAN, HiFi-GAN), stream-based vocoders (e.g., WaveGlow), diffusion-based vocoders (e.g., Diffwave), and more.

Dataset Support: Amphion unifies data preprocessing for open-source datasets and supports multiple datasets such as AudioCaps, LibriTTS, LJSpeech, and more.

GitHub：https://github.com/open-mmlab/Amphion
Paper: https://arxiv.org/abs/2312.09911
HuggingFace Demo: https://huggingface.co/amphion