OpenAI unveils its speech generation model: Voice Engine

Based on text input and a 15-second audio sample, natural-sounding speech can be generated that approximates the original speaker’s voice.
Voice Engine was originally developed at the end of 2022 and has been made available to a few companies, including Heygen, for beta use.

main function

1. Natural-sounding speech generation: Using a single 15-second audio sample, Voice Engine can create a voice that is both emotional and realistic, significantly improving the naturalness and realism of the synthesized voice.
2. Support multiple uses: From educational assistance, content translation, improving service quality in remote areas, to supporting non-verbal people and helping patients recover their voices, Voice Engine has a wide range of application scenarios and spans multiple industries.
3. Language and accent retention: Voice Engine can retain the local accent of the original speaker when translating content, so that the translated speech can not only be fluent but also maintain the characteristics of the original voice.
4. Multilingual support: It can generate voice output in multiple languages to adapt to the needs of globalization. This feature is particularly important for companies and content creators who need to localize content to different language markets.

Details:https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices

Video: