DiffPortrait3D
D
Diffportrait3d
Overview :
DiffPortrait3D is a conditional difficulty model capable of synthesizing realistic 3D consistent new perspectives even from a single portrait photo taken in the wild. Specifically, given a single RGB input image, our goal is to synthesize facial details rendered from a novel camera perspective while preserving identity and facial expression. Our zero-shot approach generalizes well to various facial portraits with non-poses camera viewpoints, extreme facial expressions, and multiple artistic renderings. At its core, we utilize the generative prior of a pre-trained 2D difficulty model on large-scale image datasets as our presentation backbone, while guiding denoising through a disentangled attention control over appearance and camera posture. To this end, we first inject appearance context from the reference image into the frozen self-attention layers of UNet. Then, we manipulate the presentation view through a novel conditional control module that interprets camera posture by watching conditional images from the same view. Additionally, we insert a trainable cross-view attention module to enhance view consistency, which further enhances consistency by adopting a new 3D perception noise generation process during inference. We have demonstrated state-of-the-art results qualitatively and quantitatively on challenging wild and multi-view benchmarks.
Target Users :
["Portrait Restoration and Editing","Portrait Perspective Synthesis","Portrait Animation Creation"]
Total Visits: 474.6M
Top Region: US(19.34%)
Website Views : 61.0K
Use Cases
Synthesis of side view from front portrait
Synthesize side view from smiling front to laughing
Synthesis of realistic 3D new perspectives from sketch image
Features
Synthesis of new perspectives from a single image
Retention of identity and expression information
Applicable to single portrait photos taken in the wild
Supports extreme expressions and multiple artistic styles
Utilizes pre-trained 2D difficulty model as backbone
Injests appearance context to guide denoising
Utilizes a conditional control module to manipulate presentation views
Increases trainable cross-view attention module
3D perception noise generation enhances consistency
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase