Video Language Planning
V
Video Language Planning
Overview :
Video Language Planning (VLP) is an algorithm that, through training visual language models and text-to-video models, achieves complex, long-term visual planning. VLP takes long-term task instructions and current image observations as input and outputs a detailed multi-modal (video and language) plan describing how to complete the final task. VLP can generate long-term video plans in various robotics domains, from multi-object re-arrangement to multi-camera dual-arm dexterous manipulation. The generated video plans can be converted into real robot actions through goal-conditioned policy. Experiments demonstrate that VLP significantly improves the success rate of long-term tasks compared to previous methods.
Target Users :
Suitable for complex, long-term visual planning tasks.
Total Visits: 279
Top Region: US(100.00%)
Website Views : 66.8K
Use Cases
Stack objects in the center of a table
Put fruit into the top shelf drawer
Group building blocks by color
Features
Train visual language models and text-to-video models
Generate detailed multi-modal plans
Generate long-term video plans
Convert to real robot actions
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase