Generating singing and speaking videos based on a single image and audio input

and can control characters ‘expressions and postures
Compared with EMO, this project is open source
It can generate animations of the corresponding character’s lips synchronization, expression changes and posture changes by inputting voice.
Improved alignment accuracy between speech and generated animation, making the animation’s lips, expressions and gestures better match the speech.

Provides precise control of character expressions, posture and lip movements.
Supports adaptive control of multiple expressions and postures, enhancing the diversity and authenticity of animations.

Driven by voice and audio input, the field of portrait image animation has made significant progress in generating realistic and dynamic portraits. This study delved into the complexity of synchronizing facial movements and creating visually attractive, time-consistent animations within the framework of diffusion-based methods. Getting rid of the traditional paradigm that relies on parametric models for intermediate facial representation, our innovative approach adopts an end-to-end diffusion paradigm and introduces a layered audio-driven visual synthesis module to improve alignment accuracy between audio input and visual output, including lips, expressions and gestures. Our proposed network architecture seamlessly integrates diffusion-based generative models, UNet-based noise reducers, time alignment techniques, and reference networks. The proposed layered audio-driven visual synthesis provides adaptive control of the diversity of expressions and gestures, allowing for more effective personalization for different identities. Through a comprehensive evaluation of qualitative and quantitative analysis, our method shows significant improvements in image and video quality, lip synchronization accuracy, and motion diversity.

The project was developed by Fudan University, Baidu, ETH Zurich, and Nanjing University

For more detailed information, you can read the original text, which can be found in the following link
Thank you for watching this video. If you like it, please subscribe and like it. thank

Arxiv：https://arxiv.org/abs/2406.08801
Huggingface：https://huggingface.co/fudan-generative-ai/hallo
Project address:https://fudan-generative-vision.github.io/hallo/#/

Oil tubing: