PixelPlayer
P
Pixelplayer
Overview :
PixelPlayer is a system that can, by watching a large number of unmarked videos, learn to locate the image regions producing sound and separate the input audio into a set of components representing the sound of each pixel. Our method leverages the natural synchronous features of the visual and auditory modalities to learn a joint model for parsing sound and images without the need for additional human labeling. The system is trained using a large number of training videos featuring solo and duet performances of different instrumental combinations. There is no supervision on which instruments appear, where they are, and what sounds they produce for each video. In the testing phase, the system's input consists of videos with performances of different instruments and monaural auditory inputs. The system performs audio-visual source separation and localization, separating the input audio signal into N sound channels, each corresponding to a different instrumental category. In addition, the system can localize sound and assign different audio waveforms to each pixel in the input video.
Target Users :
["Perform unsupervised audio-visual separation","Analyze audio-visual relationships"]
Total Visits: 0
Website Views : 84.7K
Use Cases
PixelPlayer can be used to separate different instrument sounds in mixed audio.
PixelPlayer can be used to study the relationship between visual and auditory perception.
PixelPlayer can be used to explore the contribution of different pixel regions to the overall auditory experience.
Features
Audio-visual source separation and localization
Separate audio signals into components representing the sound of each pixel
Assign different audio waveforms to each pixel in the input video
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase