Understanding Video Transformers : Conceptual discovery for explaining the decision-making process of video Transformers

Understanding Video Transformers

AI Science Research AI Video Editing #Video #Interpretability #Transformer #Spatio-temporal Concepts Standard Picks Open Source

Overview :

This paper investigates the problem of conceptual interpretability for video Transformer representations. Specifically, we aim to explain the decision-making process of video Transformers based on high-level spatio-temporal concepts that are automatically discovered. Previous research on concept-based interpretability has primarily focused on image-level tasks. In contrast, video models handle the additional time dimension, increasing complexity and posing challenges in identifying dynamic concepts that evolve over time. In this work, we systematically address these challenges by introducing the first video Transformer Concept Discovery (VTCD) algorithm. To this end, we propose an effective unsupervised method for identifying video Transformer representation units (concepts) and rank their importance in the model output. The obtained concepts exhibit high interpretability, revealing the spatio-temporal reasoning mechanisms and object-centric representations within black-box video models. Through joint analysis on diverse supervised and self-supervised representations, we discover that some of these mechanisms are prevalent across video Transformers. Finally, we demonstrate that VTCD can be used to improve the performance of models on fine-grained tasks.

Target Users :

Used to explain the decision-making process of video Transformers and improve model performance

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 52.2K

Use Cases

Explain the decision-making process of video Transformers

Improve the performance of video models

Discover universal mechanisms within video Transformers

Features

Unsupervised video Transformer Concept Discovery

Ranking the importance of video Transformer concepts