Model

# Model

ModAstera

ModAstera offers an all-in-one medical AI development platform that accelerates R&D processes by utilizing AI-assisted data labeling and medical AI engineering agents. It reduces development costs and helps bring products to market faster than competitors. The product meets the digital transformation needs of the healthcare industry.

Development Platform

SWE-1

SWE-1 is our first model family designed to optimize the entire software engineering process, aiming to accelerate software development by 99%. Unlike traditional models that can only write code, SWE-1 not only writes code but also handles terminal operations, accesses other knowledge and the internet, tests products, and understands user feedback. The SWE-1 series includes three models: SWE-1, SWE-1-lite, and SWE-1-mini, catering to different user needs.

DeepSeek-Prover-V2-671B

Deepseek Prover V2 671B

DeepSeek-Prover-V2-671B is an advanced artificial intelligence model designed to provide strong reasoning capabilities. It is based on the latest technology and applicable to various scenarios. The model is open source, aiming to promote the democratization and popularization of AI technology, reduce technical barriers, and enable more developers and researchers to use AI technology for innovation. By using this model, users can enhance their work efficiency and advance the progress of various projects.

Kimi-Audio

Kimi-Audio is an advanced open-source audio foundation model designed to handle a variety of audio processing tasks, such as speech recognition and audio dialogue. The model has been extensively pre-trained on over 13 million hours of diverse audio and text data, giving it strong audio reasoning and language understanding capabilities. Its key advantages include excellent performance and flexibility, making it suitable for researchers and developers to conduct audio-related research and development.

Speech Recognition

Wan2.1-FLF2V-14B

Wan2.1 FLF2V 14B

Wan2.1-FLF2V-14B is an open-source, large-scale video generation model designed to advance the field of video generation. This model excels in multiple benchmark tests, supports consumer-grade GPUs, and efficiently generates 480P and 720P videos. It performs exceptionally well in various tasks, including text-to-video and image-to-video, possessing strong visual-text generation capabilities suitable for diverse real-world applications.

Video Production

Quasar Alpha

openrouter is an innovative multi-model chat interface that allows users to easily interact with different language models in their browser. Its simple interface makes chatting more intuitive and fun, suitable for various user needs, including role-playing and programming assistance. The product stores data locally, ensuring user privacy and data security. As a web application, users don't need to install any software, allowing access anytime, anywhere, improving convenience and flexibility.

EasyControl Ghibli

Easycontrol Ghibli

EasyControl Ghibli is a newly released model, based on the Hugging Face platform, designed to simplify the control and management of various artificial intelligence tasks. The model combines advanced technology with a user-friendly interface, allowing users to interact with AI in a more intuitive way. Its main advantages are ease of use and powerful functionality, making it suitable for users of different backgrounds, from beginners to professionals.

Development and Tools

Selene API

Selene API is an advanced AI evaluation model launched by Atla AI. Using world-leading LLM-as-a-Judge technology, it provides precise AI application evaluations. Key advantages include high accuracy and reliability, surpassing leading models across various evaluation benchmarks. It offers accurate scoring and actionable feedback to help developers optimize their AI applications. Developed by Atla AI, a company committed to building a safe AI future, Selene API currently offers a free trial and uses a usage-based pricing model.

R1-Omni

R1-Omni is an innovative multimodal emotion recognition model that enhances model reasoning and generalization capabilities through reinforcement learning. Developed based on HumanOmni-0.5B, it focuses on emotion recognition tasks and can perform emotion analysis using visual and audio modal information. Its main advantages include strong reasoning capabilities, significantly improved emotion recognition performance, and excellent performance on out-of-distribution data. This model is suitable for scenarios requiring multimodal understanding, such as sentiment analysis and intelligent customer service, and has significant research and application value.

Emotional companionship

AI Co-scientist

AI Co Scientist

AI Co-scientist is a multi-agent AI system developed by Google Research to assist scientific research through artificial intelligence techniques. Built on Gemini 2.0, the system simulates the reasoning process of scientific methods, generating new research hypotheses and experimental plans. Through multi-agent collaboration, utilizing mechanisms such as generation, reflection, ranking, and evolution, it continuously optimizes its output. The main advantages of AI Co-scientist include the efficient generation of novel scientific hypotheses, strong interdisciplinary knowledge integration capabilities, and collaborative capabilities with scientists. Currently in the research phase, the system is being validated for its application potential in fields like biomedicine through partnerships with leading global research institutions.

Research equipment

OmniParser V2

OmniParser V2 is an advanced artificial intelligence model developed by the Microsoft Research team. It aims to transform large language models (LLMs) into intelligent agents capable of understanding and manipulating graphical user interfaces (GUIs). By converting interface screenshots from pixel space into interpretable structured elements, OmniParser V2 enables LLMs to more accurately identify interactive icons and execute predetermined actions on the screen. OmniParser V2 has achieved significant improvements in detecting small icons and rapid reasoning. Combined with GPT-4o, it achieved an average accuracy of 39.6% on the ScreenSpot Pro benchmark, far exceeding the original model's 0.8%. In addition, OmniParser V2 provides the OmniTool, which supports integration with various LLMs, further promoting the development of GUI automation.

Automated Workflow

Goku

Goku is an AI model dedicated to video generation, capable of creating high-quality video content based on textual prompts. This model employs advanced streaming generation technology, producing smooth and engaging videos suitable for various scenarios, including advertising, entertainment, and creative content production. Goku's primary advantages are its efficient generation capabilities and exceptional performance in complex scenes, significantly reducing video production costs while enhancing content appeal. The model was jointly developed by research teams from the University of Hong Kong and ByteDance, aimed at advancing video generation technology.

Video Production

Qwen2.5-Max

Qwen2.5-Max is a large-scale Mixture-of-Expert (MoE) model that has undergone pre-training with over 200 trillion tokens, supervised fine-tuning, and reinforcement learning from human feedback. It excels in multiple benchmark tests, demonstrating robust knowledge and coding capabilities. The model is accessible via API provided by Alibaba Cloud, supporting developers across various application scenarios. Its key advantages include powerful performance, flexible deployment options, and efficient training techniques, aimed at providing smarter solutions in the field of artificial intelligence.

PengChengStarling

Pengchengstarling

PengChengStarling is an open-source toolkit focused on multilingual automatic speech recognition (ASR), developed based on the icefall project. It supports the entire ASR process, including data processing, model training, inference, fine-tuning, and deployment. By optimizing parameter configurations and integrating language identifiers into the RNN-Transducer architecture, it significantly enhances the performance of multilingual ASR systems. Its main advantages include efficient multilingual support, a flexible configuration design, and robust inference performance. The models in PengChengStarling perform exceptionally well across various languages, require relatively small model sizes, and offer extremely fast inference speeds, making it suitable for scenarios that demand efficient speech recognition.

Speech Recognition

QVQ-72B-Preview

QVQ 72B Preview

QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. The model demonstrates strong abilities in multidisciplinary understanding and reasoning, achieving significant advances especially in mathematical reasoning tasks. Although advancements have been made in visual reasoning, it does not completely replace the capabilities of Qwen2-VL-72B, and may gradually lose focus on image content in multi-step visual reasoning, leading to hallucinations. Furthermore, QVQ does not show significantly better performance in basic recognition tasks compared to Qwen2-VL-72B.

Skywork-o1-Open-PRM-Qwen-2.5-1.5B

Skywork O1 Open PRM Qwen 2.5 1.5B

Skywork-o1-Open-PRM-Qwen-2.5-1.5B is part of a series developed by the Skywork team, which combines the slow thinking and reasoning capabilities characteristic of the o1 style. This model is specifically designed to enhance reasoning skills through incremental process rewards, making it suitable for solving small-scale complex problems. Unlike simple reproductions of the OpenAI o1 model, the Skywork o1 Open series not only demonstrates inherent thinking, planning, and reflection abilities in its outputs but also shows significant improvements in reasoning skills on standard benchmarking tests. This series represents a strategic advancement in AI capabilities, pushing inherently weaker foundational models towards state-of-the-art (SOTA) performance in reasoning tasks.

Skywork-o1-Open-Llama-3.1-8B

Skywork O1 Open Llama 3.1 8B

Skywork-o1-Open-Llama-3.1-8B is a series of models developed by the Kunlun Technology Skywork team, integrating the slow thinking and reasoning capabilities characteristic of o1 style. This series showcases inherent thinking, planning, and reflective abilities in its outputs, alongside a significant enhancement in reasoning skills as evidenced by standard benchmark tests. This series represents a strategic advancement in AI capabilities, elevating a traditionally weaker foundational model to state-of-the-art performance in reasoning tasks.

CriticGPT

CriticGPT is a tool developed based on the GPT-4 model, designed to assist humans in reviewing ChatGPT's code output. By identifying errors and providing comments, it enhances the accuracy and efficiency of the trainer's review. This tool effectively captures potential issues, providing strong support for AI model improvement.

AI code assistant

AIModels.fyi

AIModels.fyi is a platform dedicated to the AI field, offering daily summaries of AI papers, models, and tools. It uses algorithms to filter out the most significant developments in AI, and transforms complex models and papers into concise and clear guides, helping users quickly absorb and apply the information. Subscribers also gain access to personalized AI content, including top model, paper, and tool guides that are easy to understand even without a PhD, as well as exclusive access to a dedicated Discord community for interacting with AI experts and builders.

AI Information Platform

BasicPrompt

BasicPrompt is a tool that helps you build, deploy, and test general-purpose prompts. It provides an editor where you can write general prompts using U blocks. BasicPrompt automatically optimizes your prompts to adapt to different language models. You can evaluate the performance of prompts on different models using the built-in testing tools. BasicPrompt also supports one-click deployment of prompts to applications without coding. With BasicPrompt, you can quickly build, deploy, and share prompts, allowing team members to easily contribute.

Development & Tools

Model Muse AI

Model Muse is a platform that provides virtual fashion models for e-commerce fashion brands. Utilizing the latest artificial intelligence image generation technology, it creates unique model personas for brands, replacing the traditional high-cost photoshoot. The platform allows for easy customization of model traits to align perfectly with a brand's unique voice.

AI design tools

Line2Depth SD 1.5

Line2depth SD 1.5

Line2Depth SD 1.5 is a model that uses control networks like Canny, lines, and Softedge to create depth-based images solely from lines. Add 'depth, 3d' to your prompt. The number after the Lora filename indicates the number of merged Loras, each producing different results, so select one that yields desirable effects.

AI image generation

Mistral-22B-v0.2

Mistral 22B V0.2

Mistral-22b-v0.2 is a powerful model that demonstrates excellent mathematical and programming abilities. Compared to V1, the V2 model has significantly improved coherence and multi-turn dialogue capabilities. This model has been re-adjusted to remove censorship and can answer any question. The training data primarily includes multi-turn dialogues, with a particular emphasis on programming content. Additionally, the model has agent capabilities and can execute real-world tasks. Training utilized a 32k context length. When using the model, please adhere to the GUANACO prompt format.

Gemini 1.5 Pro

Gemini 1.5 Pro is the next-generation AI model launched by the Google Developers Platform. It supports new features like speech understanding, system commands, and JSON output, and introduces the new-generation text embedding model Gecko, with a significant performance boost. Developers can obtain an API key in Google AI Studio and start using it.

Fireworks AI

Fireworks collaborates with world-leading generative AI researchers to deliver the best models at the fastest pace. Featuring carefully curated and optimized models from Fireworks, alongside enterprise-grade throughput and expert technical support. Position itself as the fastest and most reliable AI platform.

Model Training and Deployment

GenAD

GenAD, the first large-scale autonomous driving video generation model jointly launched by Shanghai Artificial Intelligence Laboratory with Hong Kong University of Science and Technology, University of Tübingen in Germany, and the University of Hong Kong, predicts and simulates real-world scenarios to support research and application of autonomous driving technology. GenAD exhibits strong capabilities in understanding complex dynamic environments, adapting to open-world scenarios, and precise predictions. It can be controlled through language and driving trajectories, showcasing its potential for application in autonomous driving planning tasks, thus contributing to improved driving safety and efficiency.

Video Production

NVIDIA Project GR00T

NVIDIA Project GR00T

NVIDIA Project GR00T is a general-purpose foundational model that can revolutionize the way humanoid robots learn in both simulated and real-world environments. Trained in NVIDIA GPU-accelerated simulations, GR00T enables humanoid robots to learn from limited human demonstrations through imitation learning and reinforcement learning in NVIDIA Isaac Lab. It can also generate robot actions from video data. The GR00T model accepts multimodal instructions and past interaction history as input and outputs the actions the robot needs to execute.

Gitee AI

Gitee AI gathers the latest and hottest AI models, providing a one-stop service for model experience, inference, training, deployment, and application. It offers abundant computing power and is positioned as the best AI community in China.

ModelAgents - AI Fashion Models Generator

Modelagents AI Fashion Models Generator

ModelAgents is an AI fashion model generator that provides efficient and cost-effective solutions to real model photography for e-commerce retailers. With its powerful AI-powered design assistant and extensive controllable AI model style library, you can easily create stunning AI-generated images, graphics, videos, and animations. ModelAgents is a must-have tool for hobbyists, architects, interior designers, product designers, and game animators. Its AI supermodel generation technology can generate realistic models and backgrounds for your fashion brand, regardless of size, race, age, or gender, including plus-size models, Victoria's Secret models, male models, lingerie models, Black models, and more. Whether you're looking for an AI-powered design assistant to help bring your creative vision to life or a tool to help you explore new creative possibilities, ModelAgents is the perfect choice.

AI image generation

VideoPrism

VideoPrism is a general-purpose video coding model that achieves leading performance across various video understanding tasks, including classification, localization, retrieval, subtitle generation, and Q&A. Its innovation lies in the very large and diverse pre-training dataset, which contains 36 million high-quality video-text pairs and 582 million video clips with noisy text. The pre-training uses a two-phase strategy: initially, it employs contrastive learning to match videos with text, followed by predicting masked video blocks to fully utilize different supervisory signals. A fixed VideoPrism model can be directly adapted to downstream tasks and has refreshed state-of-the-art scores on 30 video understanding benchmarks.

AI video generation

Featured AI Tools

Flow AI

Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.

Video Production

NoCode

NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.

Development Platform

ListenHub

ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.

MiniMax Agent

MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.

Multimodal technology

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.

Image Generation

OpenMemory MCP

OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.

FastVLM

FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.

Image Processing

LiblibAI

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase