YOLO World : Real-time open vocabulary object detection

YOLO World

AI image detection and recognition AI model #Real-time #Object detection #Open vocabulary #Visual-language modeling #Pre-training Standard Picks Open Source

Overview :

YOLO-World is an advanced real-time open vocabulary object detector based on the You Only Look Once (YOLO) series of detectors. It enhances open vocabulary detection capabilities through visual-language modeling and pre-training on a large dataset. It employs a novel reparameterizable visual-language path aggregation network (RepVL-PAN) and region-text contrastive loss, promoting interaction between visual and linguistic information. YOLO-World efficiently detects a variety of objects in a zero-shot manner, exhibiting high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP and 52.0 FPS on a V100, outperforming many state-of-the-art methods in both accuracy and speed. Moreover, fine-tuned YOLO-World demonstrates outstanding performance on multiple downstream tasks, including object detection and open vocabulary instance segmentation.

Target Users :

Applicable to object detection and open vocabulary instance segmentation

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 114.8K

Use Cases

1. Implement real-time open vocabulary object detection using YOLO-World.

2. Perform zero-shot inference with YOLO-World on the LVIS dataset.

3. Use YOLO-World for object detection and open vocabulary instance segmentation.

Features

Real-time open vocabulary object detection

Efficiently detect various objects in a zero-shot manner