Denoising Vision Transformers : Provides clean visual features

Denoising Vision Transformers

AI image enhancement AI model #Image Processing #Deep Learning #Model Optimization Standard Picks Open Source

Overview :

Denoising Vision Transformers (DVT) is a novel noise model for Vision Transformers (ViTs). By dissecting the ViT output and introducing a learnable denoiser, DVT can extract noise-free features, significantly improving the performance of Transformer-based models in both offline and online applications. DVT does not require retraining existing pre-trained ViTs and can be applied immediately to any Transformer-based architecture. Through extensive evaluations on multiple datasets, we found that DVT consistently and significantly improves existing state-of-the-art general models (e.g., +3.84 mIoU) in both semantic and geometric tasks. We hope our research encourages a re-evaluation of ViT design, especially regarding the naive use of positional embeddings.

Target Users :

DVT is suitable for scenarios such as image denoising, image feature extraction, and improving the performance of visual tasks.

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 54.4K

Use Cases

Image Denoising: Use the DVT model to denoise images containing noise.

Image Feature Extraction: Utilize DVT to extract clean visual features for image recognition tasks.

Improve Visual Task Performance: Apply DVT to enhance the performance of Transformer-based visual models in semantic and geometric tasks.

Features

Dissect ViT output

Introduce a learnable denoiser