Denoising Vision Transformers
D
Denoising Vision Transformers
Overview :
Denoising Vision Transformers (DVT) is a novel noise model for Vision Transformers (ViTs). By dissecting the ViT output and introducing a learnable denoiser, DVT can extract noise-free features, significantly improving the performance of Transformer-based models in both offline and online applications. DVT does not require retraining existing pre-trained ViTs and can be applied immediately to any Transformer-based architecture. Through extensive evaluations on multiple datasets, we found that DVT consistently and significantly improves existing state-of-the-art general models (e.g., +3.84 mIoU) in both semantic and geometric tasks. We hope our research encourages a re-evaluation of ViT design, especially regarding the naive use of positional embeddings.
Target Users :
DVT is suitable for scenarios such as image denoising, image feature extraction, and improving the performance of visual tasks.
Total Visits: 29.7M
Top Region: US(17.94%)
Website Views : 54.4K
Use Cases
Image Denoising: Use the DVT model to denoise images containing noise.
Image Feature Extraction: Utilize DVT to extract clean visual features for image recognition tasks.
Improve Visual Task Performance: Apply DVT to enhance the performance of Transformer-based visual models in semantic and geometric tasks.
Features
Dissect ViT output
Introduce a learnable denoiser
Extract noise-free features
Improve the performance of Transformer-based models
Do not require retraining existing pre-trained ViTs
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase