Robotics

# Robotics

Genie Studio

Genie Studio is a one-stop development platform specifically designed by Zhiyuan Robotics for embodied AI scenarios, with full-chain product capabilities covering data acquisition, model training, simulation evaluation, and model inference. It provides developers with a standardized solution from 'acquisition' to 'training' to 'testing' to 'inference', greatly reducing the development threshold and improving development efficiency. The platform promotes the rapid development and application of embodied AI technology through efficient data acquisition, flexible model training, precise simulation evaluation, and seamless model inference. Genie Studio not only provides powerful tools but also supports the large-scale implementation of embodied AI, accelerating the industry's leap to a new stage of standardization, platformization, and mass production.

Development Platform

Gemini Robotics

Gemini Robotics

Gemini Robotics is an advanced artificial intelligence model from Google DeepMind, designed for robotic applications. Based on the Gemini 2.0 architecture, it fuses vision, language, and action (VLA), enabling robots to perform complex real-world tasks. The importance of this technology lies in its advancement of robots from the laboratory to everyday life and industrial applications, laying the foundation for the future development of intelligent robots. Key advantages of Gemini Robotics include strong generalization capabilities, interactivity, and dexterity, allowing it to adapt to different tasks and environments. Currently, the technology is in the research and development phase, and specific pricing and market positioning have not yet been defined.

GO-1

AgiBot's general-purpose embodied base large model, GO-1, is a revolutionary AI model. Based on the innovative Vision-Language-Latent-Action (ViLLA) architecture, this model uses a multi-modal large model (VLM) and a Mixture-of-Experts (MoE) system to achieve efficient conversion from visual and language input to robot action execution. GO-1 can learn from human videos and real robot data, possesses strong generalization capabilities, and can quickly adapt to new tasks and environments with minimal or even zero data. Its main advantages include efficient learning ability, strong generalization performance, and adaptability to various robot bodies. The launch of this model marks a significant step towards the generalization, openness, and intelligence of embodied intelligence, and is expected to play an important role in commercial, industrial, and household applications.

Clone

Clone is a humanoid robot developed by Clone Robotics, representing the forefront of robotics technology. It employs revolutionary Myofiber artificial muscle technology, capable of simulating the movement of natural animal skeletons. Myofiber technology achieves unprecedented levels in weight, power density, speed, strength-to-weight ratio, and energy efficiency, enabling the robot to exhibit natural walking ability, considerable strength, and flexibility. Clone is not only technologically significant but also offers new possibilities for future robot applications in home, industrial, and service sectors. It is positioned as a high-end technology product targeting individuals, research institutions, and businesses interested in cutting-edge technology.

Aria Gen 2

Aria Gen 2 is Meta's second-generation research-grade smart glasses, designed specifically for machine perception, contextual AI, and robotics research. It integrates advanced sensors and low-power machine perception technology, capable of real-time processing of SLAM, eye tracking, and gesture recognition. This product aims to advance artificial intelligence and machine perception technologies, providing researchers with powerful tools to explore how AI can better understand the world from a human perspective. Aria Gen 2 not only achieves technological breakthroughs but also promotes open research and public understanding of these crucial technologies through collaborations with academia and commercial research labs.

Research Equipment

Figure AI Helix

Figure AI Helix

Helix is an innovative vision-language-action model designed for general-purpose control of humanoid robots. It addresses several long-standing challenges in robotic manipulation in complex environments by combining visual perception, language understanding, and action control. Key advantages of Helix include strong generalization capabilities, efficient data utilization, and a single neural network architecture that eliminates the need for task-specific fine-tuning. The model aims to provide robots in home environments with on-the-fly behavior generation capabilities, enabling them to handle unseen objects. The emergence of Helix marks a significant step forward in robotics' ability to adapt to everyday life scenarios.

Magma-8B

Magma-8B is a foundational multi-modal AI model developed by Microsoft, specifically designed for researching multi-modal AI agents. It integrates text and image inputs to generate text outputs and possesses visual planning and agent capabilities. The model utilizes Meta LLaMA-3 as its language model backbone and incorporates a CLIP-ConvNeXt-XXLarge vision encoder. It can learn spatiotemporal relationships from unlabeled video data, exhibiting strong generalization capabilities and multi-task adaptability. Magma-8B excels in multi-modal tasks, particularly in spatial understanding and reasoning. It provides a powerful tool for multi-modal AI research, advancing the study of complex interactions in virtual and real-world environments.

Magma

Magma, developed by Microsoft Research, is a multimodal foundational model designed to enable complex task planning and execution through the combination of vision, language, and action. Pre-trained on large-scale visual-language data, it possesses capabilities in language understanding, spatial intelligence, and action planning, allowing it to excel in tasks such as UI navigation and robot operation. This model provides a powerful foundation framework for multimodal AI agent tasks, with broad application prospects.

ASAP

ASAP (Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills) is an innovative two-stage framework aimed at addressing the dynamic mismatch between simulation and the real world, thereby enabling agile whole-body skills in humanoid robots. This technology significantly enhances a robot's adaptability and coordination in complex dynamic environments by pre-training movement tracking strategies and training a residual motion model with real-world data. Key advantages of ASAP include efficient use of data, substantial performance improvements, and precise control over complex movements, providing new directions for future humanoid robot development, especially in scenarios requiring high flexibility and adaptability.

Model Training and Deployment

NVIDIA Cosmos

NVIDIA Cosmos is an advanced foundational model platform aimed at accelerating the development of physical AI systems, such as autonomous vehicles and robotics. It provides a range of pre-trained generative models, advanced tokenizers, and accelerated data processing pipelines to make it easier for developers to build and optimize physical AI applications. With its open model licensing, Cosmos reduces development costs and enhances efficiency, making it suitable for businesses and research institutions of various sizes.

Model Training and Deployment

Video Prediction Policy

Video Prediction Policy

Video Prediction Policy (VPP) is a robotic strategy based on Video Diffusion Models (VDMs) that accurately predicts future sequences of images, demonstrating a solid understanding of physical dynamics. VPP leverages visual representations from VDMs to reflect the evolution of the physical world, which is known as predictive visual representation. By combining diverse datasets of human or robotic manipulation and employing a unified video generation training objective, VPP outperforms existing methods in two simulated environments and two real-world benchmark tests. Particularly, in the Calvin ABC-D benchmark test, VPP achieved a relative improvement of 28.1% over prior state-of-the-art techniques and increased the success rate in complex real-world manipulation tasks by 28.8%.

Video Production

Genesis AI

Genesis is a comprehensive physics simulation platform designed for robotics, embodied AI, and physical AI applications. It is a general-purpose physics engine built from the ground up to simulate a wide range of materials and physical phenomena. As a lightweight, ultra-fast, Pythonic, and user-friendly simulation platform, it also features a powerful realistic rendering system and a data generation engine that converts natural language descriptions into various data modalities. By integrating its core physics engine, Genesis enhances the upper-level generative agent framework, aimed at achieving fully automated data generation for robotics and beyond.

Development & Tools

NVIDIA Jetson Orin Nano Super Developer Kit

NVIDIA Jetson Orin Nano Super Developer Kit

The NVIDIA Jetson Orin Nano Super Developer Kit is a compact generative AI supercomputer that offers superior performance at a lower price. It caters to a wide user base ranging from commercial AI developers to hobbyists and students, providing a 1.7x increase in generative AI inference performance, a performance boost to 67 INT8 TOPS, and an upgrade in memory bandwidth to 102GB/s. This product is ideal for developing retrieval-augmented generative LLM chatbots, building visual AI agents, or deploying AI-based robots.

Development and Tools

Unitree RL GYM

Unitree RL GYM is a reinforcement learning platform based on Unitree robots, supporting models such as Unitree Go2, H1, H1_2, and G1. This platform provides an integrated environment for researchers and developers to train and test reinforcement learning algorithms on real or simulated robots. Its significance lies in promoting the advancement of robot autonomy and intelligent technology, particularly in applications requiring complex decision-making and motion control. Unitree RL GYM is open-source and available for free, mainly targeting researchers and robotics enthusiasts.

Model Training and Deployment

Physical Intelligence

Physical Intelligence

Physical Intelligence (π) is a team of engineers, scientists, roboticists, and company builders dedicated to developing foundational models and learning algorithms to drive today's robotics and future physically driven devices. The team aims to apply general AI technologies to the physical world, fostering the development and innovation of robotics.

Model Training and Deployment

Digit Plexus

Digit Plexus is a robotic hardware platform designed to provide standardized hardware-software solutions for integrating tactile sensors in various robotic hands. This platform can integrate vision-based and skin-based tactile sensors, such as Digit, Digit 360, and ReSkin, into a control board and transmit all data encoded through a single cable to a host computer. This integration allows for seamless data collection, control, and analysis. Background information on the product indicates that Digit Plexus collaborated with Wonik Robotics to develop the next-generation Allegro Hand based on this platform, and there is an opportunity to express interest in early access through specific links.

Development & Tools

Digit 360

Digit 360 is a haptic sensor shaped like a human finger, released by Meta FAIR, capable of digitizing touch with human-level accuracy. The sensor features over 18 unique sensing characteristics, allowing researchers to either combine various sensing technologies or isolate signals for detailed analysis. The Digit 360 achieves spatial detail detection at 7 microns, force detection at 1 milliNewton, and has a response speed 30 times that of a human, setting a new standard in haptic sensing technology.

Research Instruments

π0 is a general-purpose robotic foundation model designed to enable AI systems to gain physical intelligence through embodied training, allowing them to perform various tasks similar to large language models and chatbot assistants. π0 acquires physical intelligence through hands-on experience on robots, capable of directly outputting low-level motor commands to control multiple types of robots, and can be fine-tuned for specific application scenarios. The development of π0 represents a significant advancement in the application of artificial intelligence in the physical world, offering the most capable and dexterous general-purpose robotic policies to date by integrating large-scale multitasking and multi-robot data collection with novel network architectures.

agibot_x1_train

Agibot X1 Train

The Agibot X1, developed by Agibot, is a modular humanoid robot with high degrees of freedom, utilizing the Agibot open-source framework AimRT as middleware and employing reinforcement learning for motion control. This project includes the reinforcement learning training code used by Agibot X1, which can be combined with the inference software provided by Agibot X1 for real robots and simulation walking debugging, or importing other robot models for training.

Development & Tools

agibot_x1_infer

Agibot X1 Infer

Agibot X1 is a modular humanoid robot developed by Agibot, featuring high degrees of freedom, based on the Agibot open-source framework AimRT as middleware, and utilizing reinforcement learning for motion control. The project includes various functional modules such as model inference, platform driving, and software simulation. The AimRT framework is an open-source framework for robotics application development, providing a complete set of tools and libraries to support robot perception, decision-making, and action. The significance of the Agibot X1 project lies in its provision of a highly customizable and extensible platform for robotics research and education.

Model Training and Deployment

Smart Yuan Lingxi X1 Development Guide

Smart Yuan Lingxi X1 Development Guide

The Smart Yuan Lingxi X1 is an open-source humanoid robot featuring 29 joints and 2 grippers, with support for an extendable 3 degrees of freedom head. It provides detailed development guides and open-source code, enabling developers to quickly assemble and conduct secondary development. The product represents advanced technology in the field of intelligent robotics, with high flexibility and scalability suitable for various scenarios including education, research, and commercial development.

Development and Tools

RoboticsDiffusionTransformer

Roboticsdiffusiontransformer

RDT-1B is a state-of-the-art imitation learning diffusion transformer with 1 billion parameters (currently the largest). It has been pre-trained on over 1 million multimodal robot scenarios. Given language instructions and RGB images from up to three views, RDT can predict the next 64 robot actions. RDT is compatible with nearly all modern mobile manipulators, including single to dual-arm systems, joint to end-effector configurations, position to velocity control, and even wheeled motions. The model has been fine-tuned on over 6,000 self-collected bimanual scenarios and deployed on the ALOHA dual-arm robot, achieving leading performance in dexterity, zero-shot generalization, and few-shot learning.

We, Robot

We, Robot is a page by Tesla showcasing its vision in the fields of autonomous driving technology and robotics. It emphasizes Tesla's commitment to creating a sustainable future, improving transportation efficiency, affordability, and safety. The page features Tesla's full self-driving technology (supervised) and the potential applications of future autonomous vehicles and robots, such as Robotaxi, Robovan, and Tesla Bot. These technologies aim to enhance everyday convenience through automation while reducing traffic accidents and lowering transportation costs.

AI Autonomous Driving

GR-2

GR-2 is an advanced general-purpose robotic agent specifically designed for diverse and generalizable robotic operations. It undergoes extensive pre-training on a large dataset of internet videos to capture the dynamics of the world. This large-scale pre-training involves 38 million video clips and over 50 billion tags, enabling GR-2 to generalize across a wide range of robotic tasks and environments during subsequent policy learning. Subsequently, GR-2 is fine-tuned for video generation and action prediction using robotic trajectories. It demonstrates impressive multi-task learning capabilities, achieving an average success rate of 97.7% over more than 100 tasks. Moreover, GR-2 excels in new, previously unseen scenarios, including new backgrounds, environments, objects, and tasks. Notably, GR-2 efficiently scales with increasing model size, highlighting its potential for continuous growth and application.

LuckyRobots

LuckyRobots is a simulation platform dedicated to making robotics accessible to ordinary software engineers. It allows users to control robots using natural language commands without relying on ROS or physical hardware. The platform offers a virtual environment, physical simulations, and multi-camera inputs, enabling users to deploy and test end-to-end AI models.

Development and Tools

Clone Incorporated

Clone Incorporated

Clone Incorporated is a company dedicated to robotics technology, committed to developing and providing innovative robotic solutions to improve production efficiency and enhance living standards. Founded by Dhanush Radhakrishnan and ?ukasz Ko?lik, who serve as CTOs, the company boasts a strong technical background and a professional team. The products exhibit a high level of technological advancement and innovation, catering to diverse industry and individual needs.

Machine Learning

OpenVLA

OpenVLA is a 700-million-parameter open-source VLA model pre-trained on 970k robot episodes from the Open X-Embodiment dataset. This model sets a new industry standard for generic robot operation policies, enabling out-of-the-box control of multiple robots and rapid adaptation to new robot setups through parameter-efficient fine-tuning. OpenVLA's checkpoints and PyTorch training procedures are completely open-source, allowing the model to be downloaded and fine-tuned from HuggingFace.

StarDust AI S1

The S1 robot developed by StarDust AI (Astribot) is a new generation AI robot capable of mimicking and learning, executing various complex tasks useful to humans. The design philosophy of the S1 robot is to make AI assistants available to billions of people, helping them complete monotonous, difficult, or dangerous tasks. The product has passed large-scale model testing and is expected to complete commercialization within 2024.

NVIDIA Project GR00T

NVIDIA Project GR00T

NVIDIA Project GR00T is a general-purpose foundational model that can revolutionize the way humanoid robots learn in both simulated and real-world environments. Trained in NVIDIA GPU-accelerated simulations, GR00T enables humanoid robots to learn from limited human demonstrations through imitation learning and reinforcement learning in NVIDIA Isaac Lab. It can also generate robot actions from video data. The GR00T model accepts multimodal instructions and past interaction history as input and outputs the actions the robot needs to execute.

DexCap

DexCap is a portable hand motion capture system that combines holography and electromagnetic field technology to provide accurate and occlusion-resistant wrist and finger motion tracking. It collects data by 3D observing the environment. The DexIL algorithm leverages inverse kinematics and point cloud-based imitation learning to directly train dexterous robotic hand skills from human hand movement data. The system supports optional human-in-the-loop calibration mechanisms, enabling the robotic hand to replicate human movements and further improve its performance based on human hand actions.

Featured AI Tools

Flow AI

Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.

Video Production

NoCode

NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.

Development Platform

ListenHub

ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.

MiniMax Agent

MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.

Multimodal technology

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.

Image Generation

OpenMemory MCP

OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.

FastVLM

FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.

Image Processing

LiblibAI

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase