

Agentcpm GUI
Overview :
AgentCPM-GUI is an open-source mobile large language model (LLM) agent designed to operate on Chinese and English applications, capable of automatically executing tasks based on user screen captures. Its main advantages lie in efficient GUI element understanding, enhanced reasoning ability, and precise support for Chinese applications. The development background of this technology is to enhance the user experience of intelligent agents on mobile devices, especially in handling complex tasks. This product is positioned to improve productivity on mobile devices and is suitable for all types of users.
Target Users :
This product is suitable for developers, product managers, and users who need to efficiently operate mobile applications, especially those using Chinese applications. AgentCPM-GUI greatly enhances work efficiency through its powerful understanding and execution capabilities, particularly in task execution under complex scenarios.
Use Cases
When using the Dianping app, users can quickly obtain restaurant information through screenshots and instructions.
On Bilibili, users can let AgentCPM-GUI automatically browse video content through specified instructions.
When using Amap, users can directly instruct the model to perform navigation and route planning.
Features
High-quality GUI element understanding: Pre-trained on a large-scale bilingual Android dataset, improving understanding capabilities for common GUI components.
Chinese application support: Fine-tuned for Chinese applications for the first time, covering over 30 popular applications.
Enhanced planning and reasoning capabilities: Through reinforced fine-tuning (RFT), the model can deliberate before generating outputs, improving the success rate of complex tasks.
Compact action space design: Optimized action space and concise JSON format reduce average action length to 9.7 tokens, enhancing inference efficiency on devices.
Simple and easy installation and usage process: Users can easily install dependencies and quickly get started.
Powerful example case support: Provides multiple application cases to help users better understand functionalities and use cases.
Support for image input: Can accept screenshots as input for image analysis and operation execution.
Adaptability to various Android applications: Designed with consideration for the usage scenarios of various Android applications, it has good adaptability.
How to Use
1. Clone the AgentCPM-GUI code repository to your local machine.
2. Install required dependencies such as Python and related libraries.
3. Download the models and place them in the designated directory.
4. Load the model and tokenizer via code and prepare input data.
5. Provide screenshots and relevant instructions for model inference.
6. Execute corresponding operations based on the model output.
7. Adjust inputs as needed, reuse to optimize results.
Featured AI Tools
Chinese Picks

Douyin Jicuo
Jicuo Workspace is an all-in-one intelligent creative production and management platform. It integrates various creative tools like video, text, and live streaming creation. Through the power of AI, it can significantly increase creative efficiency. Key features and advantages include:
1. **Video Creation:** Built-in AI video creation tools support intelligent scripting, digital human characters, and one-click video generation, allowing for the rapid creation of high-quality video content.
2. **Text Creation:** Provides intelligent text and product image generation tools, enabling the quick production of WeChat articles, product details, and other text-based content.
3. **Live Streaming Creation:** Supports AI-powered live streaming backgrounds and scripts, making it easy to create live streaming content for platforms like Douyin and Kuaishou. Jicuo is positioned as a creative assistant for newcomers and creative professionals, providing comprehensive creative production services at a reasonable price.
AI design tools
105.1M
English Picks

Pika
Pika is a video production platform where users can upload their creative ideas, and Pika will automatically generate corresponding videos. Its main features include: support for various creative idea inputs (text, sketches, audio), professional video effects, and a simple and user-friendly interface. The platform operates on a free trial model, targeting creatives and video enthusiasts.
Video Production
17.6M