SPRIGHT : Solution to improve spatial consistency in text-to-image models

SPRIGHT

AI image generation AI image detection and recognition #Text-to-image #Spatial consistency #Visual language model #Visual language dataset Standard Picks Open Source

Overview :

SPRIGHT is a large-scale visual language dataset and model focusing on spatial relationships. It constructs the SPRIGHT dataset by re-describing 6 million images, significantly increasing the spatial phrases in the descriptions. The model is fine-tuned on 444 images containing numerous objects to optimize the generation of images with spatial relationships. SPRIGHT achieves state-of-the-art spatial consistency in multiple benchmark tests while improving image quality scores.

Target Users :

SPRIGHT can be applied to any scenario requiring the generation of images with reasonable spatial layouts, such as interior design, floor plan creation, and robot environment simulation.

Total Visits： 535

Top Region： US(71.88%)

Website Views ： 70.9K

Use Cases

A living room with a fireplace, sofa on the right side of the fireplace, coffee table in front of the sofa.

A basket full of fruit, apples on the left, bananas on the right, and oranges in the middle.

A cityscape with skyscrapers on both sides of the road, a fountain in the middle.

Features

Large-scale spatial relationship dataset SPRIGHT

Fine-tuned on images with numerous objects to optimize spatial consistency

Achieves state-of-the-art spatial consistency in multiple benchmark tests

Improves image quality scores FID and CMMD