

Minigemini
Overview :
Mini-Gemini is a multimodal visual language model supporting a series of dense and MoE large language models ranging from 2B to 34B. It possesses capabilities for image understanding, reasoning, and generation. Based on LLaVA, it utilizes dual vision encoders to provide low-resolution visual embeddings and high-resolution candidate regions. It employs patch-level information mining to perform patch-level mining between high-resolution regions and low-resolution visual queries, fusing text and images for understanding and generation tasks. It supports multiple visual understanding benchmark tests, including COCO, GQA, OCR-VQA, and VisualGenome.
Target Users :
Mini-Gemini can be applied to a variety of scenarios that require simultaneous processing of text and images, such as visual question answering, image description generation, and image editing.
Use Cases
Answer relevant questions based on the content of a given image
Generate textual descriptions of images
Edit images and generate new images according to instructions
Features
Low-resolution/High-resolution Dual Vision Encoders
Patch-level Information Mining
Large Language Model-based Text-Image Fusion
Support Vision Understanding and Generation Tasks
Featured AI Tools
Chinese Picks

Capcut Dreamina
CapCut Dreamina is an AIGC tool under Douyin. Users can generate creative images based on text content, supporting image resizing, aspect ratio adjustment, and template type selection. It will be used for content creation in Douyin's text or short videos in the future to enrich Douyin's AI creation content library.
AI image generation
9.0M

Outfit Anyone
Outfit Anyone is an ultra-high quality virtual try-on product that allows users to try different fashion styles without physically trying on clothes. Using a two-stream conditional diffusion model, Outfit Anyone can flexibly handle clothing deformation, generating more realistic results. It boasts extensibility, allowing adjustments for poses and body shapes, making it suitable for images ranging from anime characters to real people. Outfit Anyone's performance across various scenarios highlights its practicality and readiness for real-world applications.
AI image generation
5.3M