Open Source

# Open Source

OmniAvatar

OmniAvatar is an advanced audio-driven video generation model that can generate high-quality virtual character animations. Its importance lies in combining audio and visual content to achieve efficient body animation, applicable to various scenarios. This technology uses deep learning algorithms to achieve high-fidelity animation generation, supports multiple input formats, and is positioned for the film, gaming, and social media sectors. The model is open source, promoting technology sharing and application.

Video Animation

OmniGen2

OmniGen2 is an efficient multimodal generation model that combines visual language models and diffusion models, enabling functions such as visual understanding, image generation, and editing. Its open-source nature provides researchers and developers with a strong foundation to explore personalized and controllable AI generation.

Image Generation

PandaWiki

PandaWiki is an open-source knowledge base construction system based on an AI large model, designed to help users quickly build intelligent product documentation and technical documentation. Its main advantage lies in providing intelligent creation, question-and-answer, and search capabilities through AI technology, greatly enhancing document management and user experience. It is suitable for teams and enterprises that hope to improve work efficiency with AI.

Chatterbox AI

Chatterbox is the first open-source production-grade text-to-speech (TTS) model released by Resemble AI, featuring outstanding performance and stability. It shows superior results compared to top closed-source systems. The unique aspect of this model is its support for exaggerated emotional control, making it ideal for use in video games, AI agents, and various other scenarios. Chatterbox offers strong price competitiveness and supports super-low latency, making it suitable for production use.

SurfSense

SurfSense is an open-source AI research assistant that integrates multiple external resources (such as search engines, Slack, Notion, etc.) to help users conduct research and manage information efficiently. The product supports uploading and searching multiple file formats, has natural language interaction capabilities, and can quickly generate content. SurfSense aims to improve research efficiency and is suitable for users with high demands for knowledge management.

Information Management

DeerFlow

DeerFlow is a deep research framework aimed at combining language models with specialized tools like web search, crawling, and Python execution to promote in-depth research work. This project originates from the open-source community, emphasizing contribution feedback, and has various flexible features suitable for different research needs.

Excel MCP Server

Excel MCP Server

Excel MCP Server is a server that allows you to operate Excel files without installing Microsoft Excel. Users can create, read, and modify Excel workbooks. The main advantages of this tool are its ease of use and flexibility, supporting multiple Excel features and allowing file operations through AI agents. This product is suitable for users who frequently handle Excel files, such as data analysts and finance personnel. This tool is open-source and developed in Python, making it easy to run locally or on remote servers.

F Lite

F Lite is a large diffusion model developed by Freepik and Fal with 10 billion parameters, specifically trained on copyright-safe and Suitable For Work (SFW) content. The model is based on Freepik's internal dataset, which contains about 80 million legally compliant images, marking the first time that publicly available models at this scale have focused on legal and safe content. Its technical report provides detailed information about the model and it is distributed under the CreativeML Open RAIL-M license. The design of the model aims to promote the openness and accessibility of AI.

Image Generation

Step1X-Edit

Step1X-Edit is a practical general-purpose image editing framework that utilizes the image understanding capabilities of MLLMs to parse editing instructions, generate editing tokens, and decode them into images via the DiT network. Its significance lies in its ability to effectively meet the editing needs of real users, enhancing the convenience and flexibility of image editing.

Wiredoor

Wiredoor is a self-hosted, open-source Ingress-as-a-Service platform that allows users to securely expose applications running on private or local networks to the internet. It leverages the reverse VPN connections provided by WireGuard and a built-in NGINX reverse proxy to ensure high performance and low latency. Wiredoor provides developers and operations personnel with complete control, avoiding reliance on public cloud solutions. This product is free and open-source, suitable for various environments, including Kubernetes and Docker.

Kimi-Audio

Kimi-Audio is an advanced open-source audio foundation model designed to handle a variety of audio processing tasks, such as speech recognition and audio dialogue. The model has been extensively pre-trained on over 13 million hours of diverse audio and text data, giving it strong audio reasoning and language understanding capabilities. Its key advantages include excellent performance and flexibility, making it suitable for researchers and developers to conduct audio-related research and development.

Speech Recognition

deepwiki

devops-exercises is a repository designed to help job seekers prepare for DevOps interviews. It contains exercises on various technologies and tools to help users improve their skills and interview performance. This project is open-source and suitable for anyone who wants to develop in the DevOps field. It covers popular technologies including Docker, Kubernetes, and AWS, suitable for beginners and experienced professionals. The project is completely free to use and promotes community learning and growth.

Flex.2-preview

Flex.2 is currently the most flexible text-to-image diffusion model, featuring built-in inpainting and general control capabilities. It is an open-source project, community-supported, and aims to democratize artificial intelligence. Flex.2 has 800 million parameters, supports 512 token length input, and is compliant with the OSI's Apache 2.0 license. This model can provide powerful support in many creative projects. Users can continuously improve the model through feedback, driving technological progress.

Image Generation

Dia AI

Dia is a text-to-speech (TTS) model developed by Nari Labs, featuring 160 million parameters, capable of generating highly realistic conversations directly from text. The model supports emotion and intonation control and can generate non-verbal communication such as laughter and coughs. Its pre-trained model weights are hosted on Hugging Face and are suitable for English generation. This product is crucial for research and educational purposes, enabling advancements in conversational AI technology.

suna

Suna is an open-source AI assistant that helps users easily complete research, data analysis, and everyday challenges through natural conversation. It combines powerful features with an intuitive interface to efficiently solve complex problems and automate workflows. Suna's toolkit includes seamless browser automation, file management, website deployment, and integration with multiple APIs. Its powerful and flexible functionality is suitable for a wide range of user needs.

Search-R1 is a reinforcement learning framework designed to train large language models (LLMs) capable of reasoning and calling search engines. Built upon veRL, it supports various reinforcement learning methods and different LLM architectures, enabling efficiency and scalability in tool-augmented reasoning research and development.

Model Training and Deployment

LeoMoon Wiki-Go

Leomoon Wiki Go

LeoMoon Wiki-Go is a fast, modern flat-file Wiki built using Go. It focuses on simplicity and performance, supports Markdown formatting, is completely database-free, and requires zero maintenance. Suitable for personal knowledge management, team collaboration, and internal documentation.

Knowledge Management

ChatTS-14B

ChatTS-14B is a language model focused on time-series understanding and reasoning, aiming to improve the processing capabilities of time-series data through synthetic data. This model can be widely applied in data analysis, financial forecasting, and other fields, providing users with deeper insights into time series and demonstrating strong reasoning ability and accuracy.

AI Playground

AI Playground is an open-source project designed to provide users with AI image creation, image stylization, and chatbot functionalities. It's specifically designed for PCs using Intel? Arc? GPUs and supports various generative AI libraries and models. The application's main advantages are its powerful image generation capabilities and user-friendly experience. Suitable for AI developers, designers, and enthusiasts, it helps them explore and utilize advanced AI technologies. The software offers users the flexibility to freely choose and download models, making it suitable for various application scenarios.

AI design tools

Wan2.1-FLF2V-14B

Wan2.1 FLF2V 14B

Wan2.1-FLF2V-14B is an open-source, large-scale video generation model designed to advance the field of video generation. This model excels in multiple benchmark tests, supports consumer-grade GPUs, and efficiently generates 480P and 720P videos. It performs exceptionally well in various tasks, including text-to-video and image-to-video, possessing strong visual-text generation capabilities suitable for diverse real-world applications.

Video Production

EaseVoice Trainer

Easevoice Trainer

EaseVoice Trainer is a backend project designed to simplify and enhance the speech synthesis and conversion training process. This project is an improvement based on GPT-SoVITS, focusing on user experience and system maintainability. Its design philosophy differs from the original project, aiming to provide a more modular and customizable solution suitable for various scenarios, from small-scale experiments to large-scale production. This tool can help developers and researchers conduct speech synthesis and conversion research and development more efficiently.

Development & Tools

PureChat

PureChat is a modern chat application that combines AI and cutting-edge technology. Built with Vue3 and ElementPlus, it integrates OpenAI, Ollama, DeepSeek, and other large language models. Its key advantages include support for Markdown rendering and chat record screenshot functionality, greatly enhancing user communication efficiency and experience. PureChat is dedicated to providing developers with a platform for quickly mastering modern technologies.

AI Video and Audio to Text & Graphic Creator

AI Video And Audio To Text & Graphic Creator

The AI Video and Audio to Text & Graphic Creator is an open-source tool designed to convert video and audio content into various document formats, helping users to reread and reflect on the content. The main advantages of this product are that it is completely open-source, requires no registration, and users can process audio and video files locally, reducing the cost of use. It is ideal for students, researchers, and content creators who need to convert audio-visual content into text.

automcp

automcp is an open-source tool designed to simplify the process of converting various existing agent frameworks (such as CrewAI, LangGraph, etc.) into MCP servers. This allows developers to more easily access these servers through a standardized interface. The tool supports the deployment of multiple agent frameworks and is operated through an easy-to-use CLI interface. Suitable for developers who need to quickly integrate and deploy AI agents, it's free of charge and suitable for individual and team use.

Development & Tools

Awesome GPT-4o Images

Awesome GPT 4o Images

Awesome GPT-4o Images is a collection showcasing images and prompts generated by OpenAI's latest multimodal model, GPT-4o. This product fully demonstrates GPT-4o's capabilities in text and image understanding, supporting the generation of various art styles. It is suitable for designers, artists, and anyone interested in AI art. This project is free and open-source, aiming to inspire creativity and promote the development of AI art.

AI information platform

Skywork-OR1

Skywork-OR1 is a high-performance mathematical code reasoning model developed by Kunlun Wanwei's Tiangong team. This model series achieves industry-leading inference performance with comparable parameter scales, breaking through the bottleneck of large models in logical understanding and complex task solving. The Skywork-OR1 series includes three models: Skywork-OR1-Math-7B, Skywork-OR1-7B-Preview, and Skywork-OR1-32B-Preview, focusing on mathematical reasoning, general reasoning, and high-performance reasoning tasks, respectively. This open-source release not only includes model weights but also fully opens the training dataset and complete training code. All resources have been uploaded to GitHub and Hugging Face, providing the AI community with a fully reproducible practical reference. This comprehensive open-source strategy helps to promote the common progress of the entire AI community in reasoning ability research.

Droidrun

Droidrun is a powerful Android automation tool designed to enable AI agents to seamlessly interact with Android applications. It combines visual understanding and UI structure extraction to provide a robust mobile platform for AI. Droidrun is currently in a waitlist phase, offering different solutions for individual developers, small teams, and enterprises.

Development & Tools

mcp-use

mcp-use is an open-source MCP client library designed to help developers connect any large language model (LLM) to MCP tools, building custom agents with tool access capabilities without using closed-source or application clients. This product provides a simple and easy-to-use API and powerful features applicable to multiple domains.

Development & Tools

Pusa

Pusa introduces an innovative approach to video diffusion modeling through frame-level noise control, enabling high-quality video generation suitable for various tasks (text-to-video, image-to-video, etc.). With its superior motion fidelity and efficient training process, the model offers an open-source solution for convenient video generation.

Video Production

UNO

UNO is a multi-image conditional generation model based on diffusion transformers. By introducing progressive cross-modal alignment and universal rotational positional embedding, it achieves highly consistent image generation. Its main advantages lie in enhanced controllability over the generation of single or multiple subjects, making it suitable for various creative image generation tasks.

Image Generation

Featured AI Tools

Flow AI

Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.

Video Production

NoCode

NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.

Development Platform

ListenHub

ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.

MiniMax Agent

MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.

Multimodal technology

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.

Image Generation

OpenMemory MCP

OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.

FastVLM

FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.

Image Processing

LiblibAI

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase