

UNIMO G
Overview :
UNIMO-G is a simple multimodal conditional diffusion framework for processing interwoven text and visual inputs. It comprises two core components: a multimodal large language model (MLLM) for encoding multimodal prompts and a conditional denoising diffusion network for generating images based on the encoded multimodal inputs. We utilize a two-stage training strategy to effectively train this framework: Firstly, pre-training on a large-scale text-image pair dataset to develop conditional image generation capabilities, followed by guided fine-tuning using multimodal prompts to achieve unified image generation capabilities. We have adopted a carefully designed data processing pipeline, including language grounding and image segmentation, to construct multimodal prompts. UNIMO-G demonstrates outstanding performance in text-to-image generation and zero-shot theme-driven synthesis, proving highly effective in generating high-fidelity images with complex multimodal prompts involving multiple image entities.
Target Users :
UNIMO-G can be used in scenarios such as text-to-image generation and zero-shot theme-driven synthesis.
Use Cases
Generate high-fidelity images with complex multimodal prompts containing multiple image entities using the UNIMO-G model.
Utilize UNIMO-G for text-to-image generation.
UNIMO-G demonstrates excellent performance in zero-shot theme-driven synthesis.
Features
Processes interwoven text and visual inputs
Generates images
Two-stage training strategy (pre-training and guided fine-tuning)
Data processing pipeline including language grounding and image segmentation
Featured AI Tools
Chinese Picks

Capcut Dreamina
CapCut Dreamina is an AIGC tool under Douyin. Users can generate creative images based on text content, supporting image resizing, aspect ratio adjustment, and template type selection. It will be used for content creation in Douyin's text or short videos in the future to enrich Douyin's AI creation content library.
AI image generation
9.0M

Outfit Anyone
Outfit Anyone is an ultra-high quality virtual try-on product that allows users to try different fashion styles without physically trying on clothes. Using a two-stream conditional diffusion model, Outfit Anyone can flexibly handle clothing deformation, generating more realistic results. It boasts extensibility, allowing adjustments for poses and body shapes, making it suitable for images ranging from anime characters to real people. Outfit Anyone's performance across various scenarios highlights its practicality and readiness for real-world applications.
AI image generation
5.3M