LEO
L
LEO
Overview :
LEO is a multimodal, multi-task all-in-one agent based on a large language model, capable of perceiving, localizing, reasoning, planning, and executing tasks in the 3D world. LEO achieves this through two stages of training: (i) 3D visual-language alignment and (ii) 3D visual-language action instruction tuning. We carefully curated and generated a large-scale dataset with object-level and scene-level multimodal tasks, requiring deep understanding and interaction with the 3D world. Through rigorous experiments, we demonstrate LEO's outstanding performance across a wide range of tasks, including 3D captioning, question answering, reasoning, navigation, and robot manipulation."
Target Users :
LEO can be used to complete a variety of tasks in the 3D world, including 3D captioning, question answering, reasoning, navigation, and robot manipulation.
Total Visits: 19
Top Region: US(100.00%)
Website Views : 43.3K
Features
3D Visual-Language Alignment
3D Visual-Language Action Instruction Tuning
3D Captioning
Question Answering
Reasoning
Navigation
Robot Manipulation
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase