VideoPrism
V
Videoprism
Overview :
VideoPrism is a general-purpose video coding model that achieves leading performance across various video understanding tasks, including classification, localization, retrieval, subtitle generation, and Q&A. Its innovation lies in the very large and diverse pre-training dataset, which contains 36 million high-quality video-text pairs and 582 million video clips with noisy text. The pre-training uses a two-phase strategy: initially, it employs contrastive learning to match videos with text, followed by predicting masked video blocks to fully utilize different supervisory signals. A fixed VideoPrism model can be directly adapted to downstream tasks and has refreshed state-of-the-art scores on 30 video understanding benchmarks.
Target Users :
- Video Classification and Localization\n- Video Retrieval\n- Video Subtitle Generation\n- Video Q&A\n- Scientific Video Analysis
Total Visits: 1.0M
Top Region: US(34.33%)
Website Views : 86.7K
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase