

Aria UI
Overview :
Aria-UI is a large-scale multimodal model specifically designed for visual localization of GUI commands. It employs a purely visual approach without relying on auxiliary inputs, accommodating a variety of planning commands and generating diverse, high-quality command samples to adapt to different tasks. Aria-UI has set new records in both offline and online agent benchmarks, surpassing baselines that rely solely on visual inputs or AXTree.
Target Users :
Aria-UI is designed for digital agents and researchers who need to automate GUI tasks. By providing robust visual localization capabilities, it enhances the efficiency and accuracy of task automation, especially in scenarios involving complex GUIs and diverse commands.
Use Cases
Automate the task of stopping services by interpreting GUI commands and locating the stop service button.
Verify the color palette by visually locating the palette area within the GUI.
Enable iCloud photo features by identifying and interacting with the iCloud settings in the GUI.
Features
- ? Multi-format command understanding: Aria-UI can process a variety of localization commands, adapting to different formats and ensuring robust adaptability in dynamic environments or with various planning agents.
- ?? Context-aware localization: Aria-UI effectively utilizes historical inputs, whether in pure text or mixed formats, to enhance localization accuracy.
- ? Lightweight and fast: As a mixed expert model with 3.9 billion parameters activated per token, Aria-UI efficiently encodes GUI inputs of varying sizes and aspect ratios, supporting ultra-high resolutions.
- ?? Outstanding performance: Aria-UI ranked first and third in the AndroidWorld and OSWorld benchmarks, respectively, showcasing its exceptional performance.
How to Use
1. Visit the Aria-UI HF Space Demo page to experience the model's capabilities online.
2. Download and install the necessary Aria-UI datasets and model checkpoints for local use.
3. Read the Aria-UI paper and documentation to understand the model's functionality and usage.
4. Write or adjust localization commands according to specific GUI tasks to meet Aria-UI's input requirements.
5. Utilize the Aria-UI model for visual localization of the GUI and perform automation tasks.
6. Adjust and optimize model parameters as needed to enhance task execution accuracy and efficiency.
Featured AI Tools
Chinese Picks

Douyin Jicuo
Jicuo Workspace is an all-in-one intelligent creative production and management platform. It integrates various creative tools like video, text, and live streaming creation. Through the power of AI, it can significantly increase creative efficiency. Key features and advantages include:
1. **Video Creation:** Built-in AI video creation tools support intelligent scripting, digital human characters, and one-click video generation, allowing for the rapid creation of high-quality video content.
2. **Text Creation:** Provides intelligent text and product image generation tools, enabling the quick production of WeChat articles, product details, and other text-based content.
3. **Live Streaming Creation:** Supports AI-powered live streaming backgrounds and scripts, making it easy to create live streaming content for platforms like Douyin and Kuaishou. Jicuo is positioned as a creative assistant for newcomers and creative professionals, providing comprehensive creative production services at a reasonable price.
AI design tools
105.1M

Promeai
PromeAI is powered by a robust AI-driven design assistant and a vast library of controllable AIGC (C-AIGC) model styles. It enables you to effortlessly create stunning graphics, videos, and animations, making it an indispensable tool for architects, interior designers, product designers, and game & animation designers.
AI design tools
6.5M