

Understanding Video Transformers
Overview :
This paper investigates the problem of conceptual interpretability for video Transformer representations. Specifically, we aim to explain the decision-making process of video Transformers based on high-level spatio-temporal concepts that are automatically discovered. Previous research on concept-based interpretability has primarily focused on image-level tasks. In contrast, video models handle the additional time dimension, increasing complexity and posing challenges in identifying dynamic concepts that evolve over time. In this work, we systematically address these challenges by introducing the first video Transformer Concept Discovery (VTCD) algorithm. To this end, we propose an effective unsupervised method for identifying video Transformer representation units (concepts) and rank their importance in the model output. The obtained concepts exhibit high interpretability, revealing the spatio-temporal reasoning mechanisms and object-centric representations within black-box video models. Through joint analysis on diverse supervised and self-supervised representations, we discover that some of these mechanisms are prevalent across video Transformers. Finally, we demonstrate that VTCD can be used to improve the performance of models on fine-grained tasks.
Target Users :
Used to explain the decision-making process of video Transformers and improve model performance
Use Cases
Explain the decision-making process of video Transformers
Improve the performance of video models
Discover universal mechanisms within video Transformers
Features
Unsupervised video Transformer Concept Discovery
Ranking the importance of video Transformer concepts
Revealing the spatio-temporal reasoning mechanisms and object representations in video Transformers
Featured AI Tools

Funclip
FunClip is a fully open-source, locally deployed automated video editing tool. It utilizes the FunASR Paraformer series of open-source models from Alibaba's TGETHER Lab for video voice recognition. Users can then freely select text segments or speakers from the recognized results, and clicking the crop button retrieves the corresponding video clip. FunClip integrates Alibaba's open-source industrial-grade Paraformer-Large model, one of the best-performing open-source Chinese ASR models currently available, and accurately predicts timestamps in an integrated manner.
AI Video Editing
235.4K
Chinese Picks

Kuaiying
Developed by Kuaishou, KuaiYing is a video editing application that offers a comprehensive suite of video editing features, including cutting, audio, subtitles, special effects, and more. It aims to help users easily create engaging and professional video content. It features an AI-powered anime video function that can transform videos into anime styles, offering various options like anime style, national style, and Japanese anime style. Additionally, KuaiYing boasts AI creation tools such as AI drawing, AI text-to-image, and an AI copywriting library to assist users in their creative endeavors. KuaiYing also provides a creative center to help users view data, find inspiration, and offers a powerful resource library including stickers and trending content to enhance user engagement.
AI Video Editing
217.8K