Generate high-quality animations/videos using audio + reference portraits
Generating in two stages:
1) Extract 3D intermediate representations from the audio and project them onto a series of 2D facial landmark points
2) Use a diffusion model that combines motion modules to further transform the landmark point sequence into animations with high visual quality
Can also be used for facial motion editing and reproduction
https://github.com/Zejun-Yang/AniPortrait
Video: