Videoretalking : Audio-driven video editing for high-quality lip-sync synchronization.

Videoretalking

AI video editing AI video generation #Audio-driven #Lip-sync #Facial enhancement #Expression editing #Video editing Standard Picks Open Source

Overview :

VideoReTalking is a novel system that can edit real-world talking head videos to produce high-quality lip-sync output videos based on input audio, even with varying emotions. The system breaks down this goal into three consecutive tasks: (1) Generating facial videos with normalized expressions using an expression editing network; (2) Audio-driven lip-sync synchronization; (3) Facial enhancement to improve photorealism. Given a talking head video, we first use an expression editing network to modify the expressions of each frame according to a standardized expression template, resulting in a video with normalized expressions. This video is then input into a lip-sync network along with the given audio to generate a lip-sync video. Finally, we use an identity-aware facial enhancement network and post-processing to enhance the photorealism of the synthesized face. We utilize learning-based methods for all three steps, and all modules can be processed sequentially in a pipeline without any user intervention.

Target Users :

Suitable for video editing scenarios requiring audio-driven lip-sync, applicable to film, TV, advertising and more.

Total Visits： 0

Website Views ： 320.2K

Use Cases

A film producer uses VideoReTalking to edit the dialogue of characters in a film, achieving high-quality lip-sync.

An advertising company uses VideoReTalking to create advertisements, ensuring the actors' mouth movements perfectly match the audio.

A TV show producer uses VideoReTalking to edit the dialogue of characters in a TV show, achieving high-quality lip-sync.

Features

Audio-driven lip-sync

Facial enhancement

Expression editing