Model

# Model

ModAstera

ModAstera offers an all-in-one medical AI development platform that accelerates R&D processes by utilizing AI-assisted data labeling and medical AI engineering agents. It reduces development costs and helps bring products to market faster than competitors. The product meets the digital transformation needs of the healthcare industry.

Development Platform

SWE-1

SWE-1 is our first model family designed to optimize the entire software engineering process, aiming to accelerate software development by 99%. Unlike traditional models that can only write code, SWE-1 not only writes code but also handles terminal operations, accesses other knowledge and the internet, tests products, and understands user feedback. The SWE-1 series includes three models: SWE-1, SWE-1-lite, and SWE-1-mini, catering to different user needs.

DeepSeek-Prover-V2-671B

Deepseek Prover V2 671B

DeepSeek-Prover-V2-671B is an advanced artificial intelligence model designed to provide strong reasoning capabilities. It is based on the latest technology and applicable to various scenarios. The model is open source, aiming to promote the democratization and popularization of AI technology, reduce technical barriers, and enable more developers and researchers to use AI technology for innovation. By using this model, users can enhance their work efficiency and advance the progress of various projects.

Kimi-Audio

Kimi-Audio is an advanced open-source audio foundation model designed to handle a variety of audio processing tasks, such as speech recognition and audio dialogue. The model has been extensively pre-trained on over 13 million hours of diverse audio and text data, giving it strong audio reasoning and language understanding capabilities. Its key advantages include excellent performance and flexibility, making it suitable for researchers and developers to conduct audio-related research and development.

Speech Recognition

Wan2.1-FLF2V-14B

Wan2.1 FLF2V 14B

Wan2.1-FLF2V-14B is an open-source, large-scale video generation model designed to advance the field of video generation. This model excels in multiple benchmark tests, supports consumer-grade GPUs, and efficiently generates 480P and 720P videos. It performs exceptionally well in various tasks, including text-to-video and image-to-video, possessing strong visual-text generation capabilities suitable for diverse real-world applications.

Video Production

Quasar Alpha

openrouter is an innovative multi-model chat interface that allows users to easily interact with different language models in their browser. Its simple interface makes chatting more intuitive and fun, suitable for various user needs, including role-playing and programming assistance. The product stores data locally, ensuring user privacy and data security. As a web application, users don't need to install any software, allowing access anytime, anywhere, improving convenience and flexibility.

EasyControl Ghibli

Easycontrol Ghibli

EasyControl Ghibli is a newly released model, based on the Hugging Face platform, designed to simplify the control and management of various artificial intelligence tasks. The model combines advanced technology with a user-friendly interface, allowing users to interact with AI in a more intuitive way. Its main advantages are ease of use and powerful functionality, making it suitable for users of different backgrounds, from beginners to professionals.

Development and Tools

Selene API

Selene API is an advanced AI evaluation model launched by Atla AI. Using world-leading LLM-as-a-Judge technology, it provides precise AI application evaluations. Key advantages include high accuracy and reliability, surpassing leading models across various evaluation benchmarks. It offers accurate scoring and actionable feedback to help developers optimize their AI applications. Developed by Atla AI, a company committed to building a safe AI future, Selene API currently offers a free trial and uses a usage-based pricing model.

R1-Omni

R1-Omni is an innovative multimodal emotion recognition model that enhances model reasoning and generalization capabilities through reinforcement learning. Developed based on HumanOmni-0.5B, it focuses on emotion recognition tasks and can perform emotion analysis using visual and audio modal information. Its main advantages include strong reasoning capabilities, significantly improved emotion recognition performance, and excellent performance on out-of-distribution data. This model is suitable for scenarios requiring multimodal understanding, such as sentiment analysis and intelligent customer service, and has significant research and application value.

Emotional companionship

AI Co-scientist

AI Co Scientist

AI Co-scientist is a multi-agent AI system developed by Google Research to assist scientific research through artificial intelligence techniques. Built on Gemini 2.0, the system simulates the reasoning process of scientific methods, generating new research hypotheses and experimental plans. Through multi-agent collaboration, utilizing mechanisms such as generation, reflection, ranking, and evolution, it continuously optimizes its output. The main advantages of AI Co-scientist include the efficient generation of novel scientific hypotheses, strong interdisciplinary knowledge integration capabilities, and collaborative capabilities with scientists. Currently in the research phase, the system is being validated for its application potential in fields like biomedicine through partnerships with leading global research institutions.

Research equipment

OmniParser V2

OmniParser V2 is an advanced artificial intelligence model developed by the Microsoft Research team. It aims to transform large language models (LLMs) into intelligent agents capable of understanding and manipulating graphical user interfaces (GUIs). By converting interface screenshots from pixel space into interpretable structured elements, OmniParser V2 enables LLMs to more accurately identify interactive icons and execute predetermined actions on the screen. OmniParser V2 has achieved significant improvements in detecting small icons and rapid reasoning. Combined with GPT-4o, it achieved an average accuracy of 39.6% on the ScreenSpot Pro benchmark, far exceeding the original model's 0.8%. In addition, OmniParser V2 provides the OmniTool, which supports integration with various LLMs, further promoting the development of GUI automation.

Automated Workflow

Goku

Goku is an AI model dedicated to video generation, capable of creating high-quality video content based on textual prompts. This model employs advanced streaming generation technology, producing smooth and engaging videos suitable for various scenarios, including advertising, entertainment, and creative content production. Goku's primary advantages are its efficient generation capabilities and exceptional performance in complex scenes, significantly reducing video production costs while enhancing content appeal. The model was jointly developed by research teams from the University of Hong Kong and ByteDance, aimed at advancing video generation technology.

Video Production

Qwen2.5-Max

Qwen2.5-Max is a large-scale Mixture-of-Expert (MoE) model that has undergone pre-training with over 200 trillion tokens, supervised fine-tuning, and reinforcement learning from human feedback. It excels in multiple benchmark tests, demonstrating robust knowledge and coding capabilities. The model is accessible via API provided by Alibaba Cloud, supporting developers across various application scenarios. Its key advantages include powerful performance, flexible deployment options, and efficient training techniques, aimed at providing smarter solutions in the field of artificial intelligence.

PengChengStarling

Pengchengstarling

PengChengStarling is an open-source toolkit focused on multilingual automatic speech recognition (ASR), developed based on the icefall project. It supports the entire ASR process, including data processing, model training, inference, fine-tuning, and deployment. By optimizing parameter configurations and integrating language identifiers into the RNN-Transducer architecture, it significantly enhances the performance of multilingual ASR systems. Its main advantages include efficient multilingual support, a flexible configuration design, and robust inference performance. The models in PengChengStarling perform exceptionally well across various languages, require relatively small model sizes, and offer extremely fast inference speeds, making it suitable for scenarios that demand efficient speech recognition.

Speech Recognition

QVQ-72B-Preview

QVQ 72B Preview

QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. The model demonstrates strong abilities in multidisciplinary understanding and reasoning, achieving significant advances especially in mathematical reasoning tasks. Although advancements have been made in visual reasoning, it does not completely replace the capabilities of Qwen2-VL-72B, and may gradually lose focus on image content in multi-step visual reasoning, leading to hallucinations. Furthermore, QVQ does not show significantly better performance in basic recognition tasks compared to Qwen2-VL-72B.

Skywork-o1-Open-PRM-Qwen-2.5-1.5B

Skywork O1 Open PRM Qwen 2.5 1.5B

Skywork-o1-Open-PRM-Qwen-2.5-1.5B is part of a series developed by the Skywork team, which combines the slow thinking and reasoning capabilities characteristic of the o1 style. This model is specifically designed to enhance reasoning skills through incremental process rewards, making it suitable for solving small-scale complex problems. Unlike simple reproductions of the OpenAI o1 model, the Skywork o1 Open series not only demonstrates inherent thinking, planning, and reflection abilities in its outputs but also shows significant improvements in reasoning skills on standard benchmarking tests. This series represents a strategic advancement in AI capabilities, pushing inherently weaker foundational models towards state-of-the-art (SOTA) performance in reasoning tasks.

Skywork-o1-Open-Llama-3.1-8B

Skywork O1 Open Llama 3.1 8B

Skywork-o1-Open-Llama-3.1-8B is a series of models developed by the Kunlun Technology Skywork team, integrating the slow thinking and reasoning capabilities characteristic of o1 style. This series showcases inherent thinking, planning, and reflective abilities in its outputs, alongside a significant enhancement in reasoning skills as evidenced by standard benchmark tests. This series represents a strategic advancement in AI capabilities, elevating a traditionally weaker foundational model to state-of-the-art performance in reasoning tasks.

CriticGPT

CriticGPT is a tool developed based on the GPT-4 model, designed to assist humans in reviewing ChatGPT's code output. By identifying errors and providing comments, it enhances the accuracy and efficiency of the trainer's review. This tool effectively captures potential issues, providing strong support for AI model improvement.

AI code assistant

AIModels.fyi

AIModels.fyi is a platform dedicated to the AI field, offering daily summaries of AI papers, models, and tools. It uses algorithms to filter out the most significant developments in AI, and transforms complex models and papers into concise and clear guides, helping users quickly absorb and apply the information. Subscribers also gain access to personalized AI content, including top model, paper, and tool guides that are easy to understand even without a PhD, as well as exclusive access to a dedicated Discord community for interacting with AI experts and builders.

AI Information Platform

BasicPrompt

BasicPrompt is a tool that helps you build, deploy, and test general-purpose prompts. It provides an editor where you can write general prompts using U blocks. BasicPrompt automatically optimizes your prompts to adapt to different language models. You can evaluate the performance of prompts on different models using the built-in testing tools. BasicPrompt also supports one-click deployment of prompts to applications without coding. With BasicPrompt, you can quickly build, deploy, and share prompts, allowing team members to easily contribute.

Development & Tools

Model Muse AI

Model Muse is a platform that provides virtual fashion models for e-commerce fashion brands. Utilizing the latest artificial intelligence image generation technology, it creates unique model personas for brands, replacing the traditional high-cost photoshoot. The platform allows for easy customization of model traits to align perfectly with a brand's unique voice.

AI design tools

Line2Depth SD 1.5

Line2depth SD 1.5

Line2Depth SD 1.5 is a model that uses control networks like Canny, lines, and Softedge to create depth-based images solely from lines. Add 'depth, 3d' to your prompt. The number after the Lora filename indicates the number of merged Loras, each producing different results, so select one that yields desirable effects.

AI image generation

Mistral-22B-v0.2

Mistral 22B V0.2

Mistral-22b-v0.2 is a powerful model that demonstrates excellent mathematical and programming abilities. Compared to V1, the V2 model has significantly improved coherence and multi-turn dialogue capabilities. This model has been re-adjusted to remove censorship and can answer any question. The training data primarily includes multi-turn dialogues, with a particular emphasis on programming content. Additionally, the model has agent capabilities and can execute real-world tasks. Training utilized a 32k context length. When using the model, please adhere to the GUANACO prompt format.

Gemini 1.5 Pro

Gemini 1.5 Pro is the next-generation AI model launched by the Google Developers Platform. It supports new features like speech understanding, system commands, and JSON output, and introduces the new-generation text embedding model Gecko, with a significant performance boost. Developers can obtain an API key in Google AI Studio and start using it.

Fireworks AI

Fireworks collaborates with world-leading generative AI researchers to deliver the best models at the fastest pace. Featuring carefully curated and optimized models from Fireworks, alongside enterprise-grade throughput and expert technical support. Position itself as the fastest and most reliable AI platform.

Model Training and Deployment

GenAD

GenAD, the first large-scale autonomous driving video generation model jointly launched by Shanghai Artificial Intelligence Laboratory with Hong Kong University of Science and Technology, University of Tübingen in Germany, and the University of Hong Kong, predicts and simulates real-world scenarios to support research and application of autonomous driving technology. GenAD exhibits strong capabilities in understanding complex dynamic environments, adapting to open-world scenarios, and precise predictions. It can be controlled through language and driving trajectories, showcasing its potential for application in autonomous driving planning tasks, thus contributing to improved driving safety and efficiency.

Video Production

NVIDIA Project GR00T

NVIDIA Project GR00T

NVIDIA Project GR00T is a general-purpose foundational model that can revolutionize the way humanoid robots learn in both simulated and real-world environments. Trained in NVIDIA GPU-accelerated simulations, GR00T enables humanoid robots to learn from limited human demonstrations through imitation learning and reinforcement learning in NVIDIA Isaac Lab. It can also generate robot actions from video data. The GR00T model accepts multimodal instructions and past interaction history as input and outputs the actions the robot needs to execute.

Gitee AI

Gitee AI gathers the latest and hottest AI models, providing a one-stop service for model experience, inference, training, deployment, and application. It offers abundant computing power and is positioned as the best AI community in China.

ModelAgents - AI Fashion Models Generator

Modelagents AI Fashion Models Generator

ModelAgents is an AI fashion model generator that provides efficient and cost-effective solutions to real model photography for e-commerce retailers. With its powerful AI-powered design assistant and extensive controllable AI model style library, you can easily create stunning AI-generated images, graphics, videos, and animations. ModelAgents is a must-have tool for hobbyists, architects, interior designers, product designers, and game animators. Its AI supermodel generation technology can generate realistic models and backgrounds for your fashion brand, regardless of size, race, age, or gender, including plus-size models, Victoria's Secret models, male models, lingerie models, Black models, and more. Whether you're looking for an AI-powered design assistant to help bring your creative vision to life or a tool to help you explore new creative possibilities, ModelAgents is the perfect choice.

AI image generation

VideoPrism

VideoPrism is a general-purpose video coding model that achieves leading performance across various video understanding tasks, including classification, localization, retrieval, subtitle generation, and Q&A. Its innovation lies in the very large and diverse pre-training dataset, which contains 36 million high-quality video-text pairs and 582 million video clips with noisy text. The pre-training uses a two-phase strategy: initially, it employs contrastive learning to match videos with text, followed by predicting masked video blocks to fully utilize different supervisory signals. A fixed VideoPrism model can be directly adapted to downstream tasks and has refreshed state-of-the-art scores on 30 video understanding benchmarks.

AI video generation

Featured AI Tools

騰訊混元圖像 2.0

騰訊混元圖像 2.0

騰訊混元圖像 2.0 是騰訊最新發布的 AI 圖像生成模型，顯著提升了生成速度和畫質。通過超高壓縮倍率的編解碼器和全新擴散架構，使得圖像生成速度可達到毫秒級，避免了傳統生成的等待時間。同時，模型通過強化學習算法與人類美學知識的結合，提升了圖像的真實感和細節表現，適合設計師、創作者等專業用戶使用。

Lovart

Lovart 是一款革命性的 AI 設計代理，能夠將創意提示轉化為藝術作品，支持從故事板到品牌視覺的多種設計需求。其重要性在於打破傳統設計流程，節省時間並提升創意靈感。Lovart 當前處於測試階段，用戶可加入等候名單，隨時體驗設計的樂趣。

FastVLM

FastVLM 是一種高效的視覺編碼模型，專為視覺語言模型設計。它通過創新的 FastViTHD 混合視覺編碼器，減少了高分辨率圖像的編碼時間和輸出的 token 數量，使得模型在速度和精度上表現出色。FastVLM 的主要定位是為開發者提供強大的視覺語言處理能力，適用於各種應用場景，尤其在需要快速響應的移動設備上表現優異。

KeySync

KeySync 是一個針對高分辨率視頻的無洩漏唇同步框架。它解決了傳統唇同步技術中的時間一致性問題，同時通過巧妙的遮罩策略處理表情洩漏和麵部遮擋。KeySync 的優越性體現在其在唇重建和跨同步方面的先進成果，適用於自動配音等實際應用場景。

Manus

Manus 是由 Monica.im 研發的全球首款真正自主的 AI 代理產品，能夠直接交付完整的任務成果，而不僅僅是提供建議或答案。它採用 Multiple Agent 架構，運行在獨立虛擬機中，能夠通過編寫和執行代碼、瀏覽網頁、操作應用等方式直接完成任務。Manus 在 GAIA 基準測試中取得了 SOTA 表現，展現了強大的任務執行能力。其目標是成為用戶在數字世界的‘代理人’，幫助用戶高效完成各種複雜任務。

Trae國內版

Trae是一款專為中文開發場景設計的AI原生IDE，將AI技術深度集成於開發環境中。它通過智能代碼補全、上下文理解等功能，顯著提升開發效率和代碼質量。Trae的出現填補了國內AI集成開發工具的空白，滿足了中文開發者對高效開發工具的需求。其定位為高端開發工具，旨在為專業開發者提供強大的技術支持，目前尚未明確公開價格，但預計會採用付費模式以匹配其高端定位。

開發與工具

Pika

Pika是一個視頻製作平臺,用戶可以上傳自己的創意想法,Pika會自動生成相關的視頻。主要功能有:支持多種創意想法轉視頻,視頻效果專業,操作簡單易用。平臺採用免費試用模式,定位面向創意者和視頻愛好者。

LiblibAI

LiblibAI是一箇中國領先的AI創作平臺,提供強大的AI創作能力,幫助創作者實現創意。平臺提供海量免費AI創作模型,用戶可以搜索使用模型進行圖像、文字、音頻等創作。平臺還支持用戶訓練自己的AI模型。平臺定位於廣大創作者用戶,致力於創造條件普惠,服務創意產業,讓每個人都享有創作的樂趣。

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase