PixelLLM
P
Pixelllm
Overview :
PixelLLM is a vision-language model for image localization tasks. It can generate descriptive text based on an input location and also generate pixel coordinates for dense localization based on input text. Pre-trained on the Localized Narrative dataset, the model has learned the alignment between words and image pixels. PixelLLM can be applied to a variety of image localization tasks, including instruction following localization, location-conditioned descriptions, and dense object descriptions, and has achieved state-of-the-art performance on datasets such as RefCOCO and Visual Genome.
Target Users :
Suitable for image localization tasks, such as location-conditioned descriptions, instruction following localization, and dense object descriptions.
Total Visits: 963
Top Region: US(100.00%)
Website Views : 73.7K
Features
Location-Conditioned Descriptions
Instruction Following Localization
Dense Object Descriptions
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase