Open-Source

# Open-Source

Open Multi-Agent Canvas

Open Multi Agent Canvas

Open Multi-Agent Canvas is an open-source multi-agent chat interface built using Next.js, LangGraph, and CopilotKit. It allows users to manage multiple agents within a dynamic conversation, primarily for travel planning and research. Leveraging advanced technologies, it provides users with an efficient and flexible multi-agent interaction experience. Its open-source nature allows developers to customize and extend it to meet diverse needs, offering high flexibility and scalability.

Kolors

Kolors is a large-scale text-to-image generation model developed by the Kwai Kolors team, based on latent diffusion models and trained on billions of text-image pairs. It outperforms both open-source and closed-source models in terms of visual quality, semantic accuracy, and rendering of both Chinese and English text. Kolors supports both Chinese and English input, particularly excelling in understanding and generating content specific to Chinese.

AI Image Generation

Scoopika

Scoopika is an open-source developer platform designed to empower developers to build personalized AI agents that can see, speak, hear, learn, and take action. It provides a secure, efficient, and user-friendly platform for the AI era, supporting full edge compatibility and real-time streaming. Built-in visual and voice chat functionality enhances user interaction. Scoopika emphasizes its open-source nature, offering server-side and client-side runtimes, as well as integration modules for React projects, fostering a vibrant and growing developer community.

Development Platform

Fish Speech V1.2

Fish Speech V1.2

Fish Speech V1.2 is a text-to-speech (TTS) model trained on 300,000 hours of English, Chinese, and Japanese audio data. Representing the forefront of voice synthesis technology, it delivers high-quality voice output across diverse language environments.

AI Speech Synthesis

Rakis

Rakis is a fully browser-based decentralized inference network. Leveraging blockchain technology, it allows nodes to request and share AI model inference results, enabling distributed execution of AI models without relying on servers. By utilizing browsers as nodes and supporting WebGPU compatible platforms, Rakis empowers ordinary users to participate in AI model inference. The project is open-source, emphasizing transparency and verifiability, aiming to address the challenges of determinism, scalability, and security in decentralized AI inference.

Development Platform

Friend

Friend is a leading open-source AI wearable device that provides real-time transcription services for automatic, high-quality recordings of meetings, chats, and voice memos by connecting with your mobile device. Equipped with real-time AI audio processing capabilities, low-power Bluetooth technology, and an open-source software design, users can easily access and contribute to the code. With its portability, practicality, and open-source nature, Friend offers an innovative solution for those who need an efficient way to record and manage conversational content.

AI voice assistant

FRIEND NECKLACE

FRIEND NECKLACE

FRIEND NECKLACE is an open-source wearable AI assistant equipped with personalized AI dialogue and feedback functionalities. It is a smart device that integrates multiple functions like AI notes, reminders, and suggestions. The product is fully open-source, and all data is stored on the user's phone, ensuring data privacy and security.

AI voice assistant

Tele-FLM

Tele-FLM (also known as FLM-2) is a 52-billion parameter open-source multilingual large language model with a stable and efficient pre-training paradigm and enhanced fact-checking capabilities. Based on a decoder-only transformer architecture, it has been trained on approximately 2 trillion tokens. Tele-FLM exhibits superior performance compared to models of similar size, sometimes even surpassing larger ones. Besides sharing the model weights, we also provide core design, engineering practices, and training details, hoping they will benefit both the academic and industrial communities.

ElevenLabs Texts to Sounds Effects API

Elevenlabs Texts To Sounds Effects API

ElevenLabs Texts to Sounds Effects API is a programming interface that enables developers to transform text into corresponding sound effects, suitable for various applications like video editing and game development. This API is open-source and its code is available on GitHub, allowing developers to customize and extend its functionality.

MoA

MoA (Mixture of Agents) is a novel approach that leverages the collective strengths of multiple large language models (LLMs) to improve performance, achieving state-of-the-art results. Employing a hierarchical architecture with multiple LLM agents per layer, MoA surpasses the 57.5% score achieved by GPT-4 Omni on AlpacaEval 2.0, reaching a score of 65.1% while utilizing only open-source models.

ChatTTS_Speaker

Chattts Speaker

ChatTTS_Speaker is an experimental project based on the ERes2NetV2 speaker recognition model, aiming to provide stability ratings and voice tagging for voice textures. It helps users select stable and requirement-compliant voice textures. The project is open-source, supporting online listening and downloading voice samples.

AI speech recognition

OpenVLA

OpenVLA is a 700-million-parameter open-source VLA model pre-trained on 970k robot episodes from the Open X-Embodiment dataset. This model sets a new industry standard for generic robot operation policies, enabling out-of-the-box control of multiple robots and rapid adaptation to new robot setups through parameter-efficient fine-tuning. OpenVLA's checkpoints and PyTorch training procedures are completely open-source, allowing the model to be downloaded and fine-tuned from HuggingFace.

NVIDIA RTX Remix

NVIDIA RTX Remix

NVIDIA RTX Remix, an open-source modding toolkit launched by NVIDIA, allows creators and game developers to leverage the powerful capabilities of NVIDIA RTX technology to enhance their games and creative projects. Harnessing the power of real-time ray tracing and AI-driven graphics enhancements, RTX Remix delivers stunningly realistic visual experiences to games. Beyond providing a robust platform for creators, RTX Remix fosters innovation in the gaming and creative realms by enabling open API and connector integrations with other applications and services.

AI image generation

FastGPT

FastGPT is an open-source AI knowledge base platform providing data processing, model invocation, RAG retrieval, and visualized AI workflow orchestration capabilities, helping users easily build sophisticated AI applications. It supports the development of AI customer service for specific domains, automates data preprocessing, workflow orchestration, and offers robust API integration. FastGPT's advantages include its open-source nature, unique QA structure, visual workflow, unlimited scalability, ease of debugging, and support for multiple models.

Development & Tools

ToonCrafter

ToonCrafter is an open-source research project focused on interpolating between two cartoon images using a pre-trained image-to-video diffusion prior. The project aims to positively impact the AI-driven video generation field by providing users with the freedom to create videos, but requires users to comply with local laws and use it responsibly.

AI video generation

ChatTTS.com

ChatTTS is a voice generation model designed for conversational scenarios, particularly suitable for dialogue tasks of large language model assistants and conversational audio and video introductions. It supports both English and Chinese and showcases high-quality and natural speech synthesis capabilities through training on approximately 100,000 hours of English and Chinese data.

InternLM-Math-Plus

Internlm Math Plus

InternLM-Math-Plus is a cutting-edge bilingual (English and Chinese) open-source large language model (LLM) focused on mathematical reasoning. It exhibits capabilities in solving, proving, validating, and augmenting mathematical problems. It demonstrates significant performance improvements in both informal mathematical reasoning (such as reasoning chains and code interpretation) and formal mathematical reasoning (such as LEAN 4 translation and proof).

Pygmalion AI

PygmalionAI is an open-source project dedicated to creating large language models for chat and role-playing. It boasts powerful capabilities and technology, delivering a high-quality chat experience. PygmalionAI's advantages include the accuracy and diversity of its language generation, as well as its scalability and customizability. It has a wide range of applications across various sectors, including entertainment, education, and business.

Explore Careers

Explore Careers

Explore Careers is an open-source career exploration platform that leverages artificial intelligence to help users explore suitable careers in seconds based on their skills and interests. This platform is completely free and encourages users to explore alternative career paths to find the best fit for themselves.

AI information platform

Falcon 2

Falcon 2 is an innovative generative AI model that paves the way for a future brimming with possibilities, limited only by imagination. Using an open-source license, Falcon 2 boasts multilingual and multimodal capabilities, with its unique image-to-text conversion function marking a significant advancement in AI innovation.

OpenGlass

OpenGlass is a wearable smart glasses that can record your life and provide helpful summaries and suggestions. It can be customized for different usage scenarios, catering to users who seek personalized experiences and life recording. Combining the latest hardware technology and software development, it delivers a brand-new interactive experience for users.

AI life assistant

ugly-avatar is an open-source avatar generator primarily targeting individuals and small websites, offering a fun and unique avatar generation service. Developed with Vue and JavaScript, it supports customizable configurations, making it easy to integrate and use. The project follows the Attribution-NonCommercial 4.0 International License, meaning it cannot be used for commercial purposes.

AI head portrait generation

Applio

Applio is an open-source ecosystem that primarily offers advanced AI voice cloning technology. Its key advantages lie in its innovativeness, open-source nature, and cutting-edge AI voice cloning capabilities. Applio, as an open-source ecosystem, is dedicated to driving innovation in artificial intelligence voice cloning technology. Public pricing information is not yet available.

Development & Tools

JetMoE-8B

JetMoE-8B is an open-source large language model that achieves performance surpassing Meta AI LLaMA2-7B at a cost of less than $100,000 by utilizing public datasets and optimized training methods. During inference, the model activates only 2.2 billion parameters, significantly reducing computational cost while maintaining excellent performance.

Open-Source Large Model Cookbook

Open Source Large Model Cookbook

This project is a comprehensive guide to using open-source large models, covering environment setup, model deployment, and efficient fine-tuning. It simplifies the use and application of open-source large models, enabling more ordinary learners to access and utilize them. The project is targeted towards learners interested in open-source large models and who want to get hands-on experience. It provides detailed instructions on environment configuration, model deployment, and fine-tuning methods.

DBRX

DBRX is a general-purpose large language model (LLM) built by Databricks' Mosaic research team. It outperforms all existing open-source models in standard benchmark tests. It uses a Mixture-of-Experts (MoE) architecture with 36.2 billion parameters, boasting excellent language understanding, programming, mathematical, and logical reasoning capabilities. DBRX aims to promote the development of high-quality open-source LLMs and facilitates enterprise customization of the model based on their own data. Databricks provides enterprise users with the ability to interactively use DBRX, leverage its long context capabilities to build retrieval-enhanced systems, and build customized DBRX models based on their own data.

Hugging Face

Hugging Face is an AI community platform dedicated to advancing and democratizing AI through open-source and open science. It provides a collaborative environment for the machine learning community to share models, datasets, and applications. Key benefits include: 1) **Collaboration Platform:** Unlimited hosting and sharing of models, datasets, and applications. 2) **Open-Source Stack:** Accelerates the ML development workflow. 3) **Multi-Modal Support:** Text, image, video, audio, 3D, and more. 4) **Build an ML Portfolio:** Showcase your work globally. 5) **Paid Compute & Enterprise Solutions:** Offers optimized inference endpoints, GPU support, and more.

Development Platform

LaVague

LaVague aims to redefine the internet browsing experience by translating natural language instructions into seamless browser interactions. It leverages natural language processing and Selenium integration, enabling users or other AI agents to easily express web workflows and execute them in the browser.

AI Automation Workflow

Llama 3

Meta Llama 3, released by Meta, is a new generation of open-source large language model with outstanding performance. It excels in multiple industry benchmark tests and supports a wide range of use cases, including improved reasoning capabilities. The model will support multiple languages, multimodality in the future, offering longer context windows and overall performance enhancements. Adhering to the open-source principle, Llama 3 will be deployed on major cloud services, hosting and hardware platforms for developers and the community to use.

Stable Cascade

Stable Cascade is a text-to-image generation model based on the Würstchen architecture. Compared to other models, it uses a smaller latent space for training and inference, resulting in significant improvements in both training and inference speed. The model can run on consumer-grade hardware, lowering the barrier to entry. Stable Cascade has shown outstanding performance in human evaluations, outperforming other models in both prompt alignment and image quality. Overall, it is an efficient, user-friendly, and powerful text-to-image AI model.

AI image generation

Featured AI Tools

Flow AI

Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.

Video Production

NoCode

NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.

Development Platform

ListenHub

ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.

MiniMax Agent

MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.

Multimodal technology

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.

Image Generation

OpenMemory MCP

OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.

FastVLM

FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.

Image Processing

LiblibAI

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase