LEO : All-in-one Agent in the 3D World

LEO

LEO

AI Agents AI models #Artificial Intelligence #3D World #Multimodal #Multi-task #Visual Language Alignment #Visual Language Action Instruction Tuning Standard Picks Open Source

Overview :

LEO is a multimodal, multi-task all-in-one agent based on a large language model, capable of perceiving, localizing, reasoning, planning, and executing tasks in the 3D world. LEO achieves this through two stages of training: (i) 3D visual-language alignment and (ii) 3D visual-language action instruction tuning. We carefully curated and generated a large-scale dataset with object-level and scene-level multimodal tasks, requiring deep understanding and interaction with the 3D world. Through rigorous experiments, we demonstrate LEO's outstanding performance across a wide range of tasks, including 3D captioning, question answering, reasoning, navigation, and robot manipulation."

Target Users :

LEO can be used to complete a variety of tasks in the 3D world, including 3D captioning, question answering, reasoning, navigation, and robot manipulation.

Total Visits： 19

Top Region： US(100.00%)

Website Views ： 43.3K

Features

3D Visual-Language Alignment

3D Visual-Language Action Instruction Tuning

3D Captioning

Question Answering

Reasoning

Navigation

Robot Manipulation