Microsoft has launched a tool that converts only text content into a live-action video: Deepfakes Creator

This tool is called a text-to-speech avatar for Azure AI Speech

This tool allows users to generate realistic and realistic talking videos of real people through text input. You just upload a photo of the person you want to imitate and write a script.

Support multiple languages and real-time Q&A interactions.

Key features:

1. Realistic avatar video generation: Users can upload photos and scripts, and the tool creates realistic character avatar videos to simulate character speech.
2. Multilingual Support and Real-Time Chatbot: Avatars support multiple languages and can respond to unscripted questions in real-time using AI models (such as GPT).
3. Personalized voice function: Provides one-minute voice samples to quickly replicate the user’s voice for customized voice assistants and content dubbing.
4. Legal and ethical safeguards: The use of pre-recorded voices is prohibited, requiring explicit consent from users and using specific use cases through registration.
5. Watermark technology: It will automatically add a watermark to the personal voice to help identify AI synthetic speech, and you need to agree to Microsoft’s watermark detection service.
6. Efficient Video Content Creation: Simplify the traditional video production process, suitable for creating training videos, product introductions, and more.
7. Enhanced Digital Interaction Experience: It can be used to build conversational agents, virtual assistants, and chatbots, providing natural and interactive conversations.
8. Content generation workflow: Includes text analysis, TTS audio synthesis, and TTS avatar video synthesis to generate lip animations synchronized with voice.
9. Pre-built and Custom Avatar Options: Ready-to-use pre-built avatars and customizable avatars are available, the latter being trained through user-uploaded video recordings.
10. UI tools and API access support: UI tools and API access are provided in Azure AI Speech Studio.
11. Wide range of application scenarios: suitable for creating various attractive videos and interactive applications, improving communication and information communication efficiency.

Details and API application: https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-speech-announces-public-preview-of-text-to-speech/ba-p/3981448

Code resources for video demonstrations:

GitHub：https://github.com/Azure/gen-cv/tree/main/avatar/video