VSP-LLM
V
VSP LLM
Overview :
VSP-LLM is a framework that combines Visual Speech Processing (VSP) with Large Language Models (LLMs), designed to maximize the capability of contextual modeling by leveraging the powerful abilities of LLMs. VSP-LLM is engineered for multitasking, performing visual speech recognition and translation tasks. It maps input videos to the LLM's input latent space through an unsupervised visual speech model. The framework efficiently trains by proposing a novel deduplication method and Low-Rank Adaptation (LoRA).
Target Users :
["Multi-language Speech Recognition","Cross-language Video Content Understanding","Real-time Speech Translation"]
Total Visits: 474.6M
Top Region: US(19.34%)
Website Views : 163.1K
Use Cases
Conducting real-time speech translation in a multilingual environment using VSP-LLM
Analyzing video content and extracting key information to generate summaries with VSP-LLM
Using VSP-LLM as an assistant for language learning in educational applications to improve speech recognition accuracy
Features
Visual Speech Recognition
Visual Speech Translation
Self-Supervised Learning
Deduplication and LoRA Training
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase