

Mooncake
Overview :
Mooncake is a leading large language model (LLM) service offered by Moonshot AI, serving as the service platform for Kimi. It utilizes a decoupled architecture centered around KVCache, achieving decoupling caching by separating prefill and decoding clusters and leveraging underutilized CPU, DRAM, and SSD resources within GPU clusters. At the heart of Mooncake lies its KVCache-centered scheduler, which balances maximizing overall effective throughput while ensuring compliance with latency-related service level objectives (SLOs). Different from traditional research, Mooncake addresses high-load scenarios by implementing a prediction-based early rejection strategy. Experiments demonstrate that Mooncake excels in long-context scenarios, achieving up to a 525% throughput increase compared to baseline methods in certain simulated environments while adhering to SLOs. Under real-world workloads, Mooncake's innovative architecture enables Kimi to handle over 75% of requests.
Target Users :
Mooncake is designed to serve enterprises and developers who require high-performance, high-throughput services from large language models. Its architecture and scheduling strategies are particularly suited for handling massive datasets and complex queries, meeting the real-time requirements of applications such as intelligent customer service and natural language processing.
Use Cases
An intelligent customer service system leverages Mooncake to process user queries, enhancing response speed and accuracy.
Natural language processing applications utilize Mooncake for text analysis, optimizing information extraction and semantic understanding.
Large-scale data analysis platforms employ Mooncake for data preprocessing and pattern recognition, improving data processing capabilities.
Features
KVCache-centered scheduler, optimizing overall effective throughput and latency SLOs.
Decoupled architecture, separating prefill and decoding clusters, enhancing resource utilization.
Prediction-based early rejection strategy, addressing high-load scenarios.
Excellent performance in long-context scenarios, significantly boosting throughput.
Innovative architecture, enabling Kimi to process a larger volume of requests.
Open-source technical report, providing the community with learning and contribution opportunities.
How to Use
1. Visit the Mooncake GitHub page to learn about the project details.
2. Read the technical report to understand Mooncake's architecture and functions.
3. Set up and configure the Mooncake environment according to the project documentation.
4. Integrate Mooncake into your applications using its API or interface.
5. Monitor and optimize Mooncake's performance to meet your business needs.
6. Participate in community discussions to provide feedback and suggestions for Mooncake's development.
Featured AI Tools

Gemini
Gemini is the latest generation of AI system developed by Google DeepMind. It excels in multimodal reasoning, enabling seamless interaction between text, images, videos, audio, and code. Gemini surpasses previous models in language understanding, reasoning, mathematics, programming, and other fields, becoming one of the most powerful AI systems to date. It comes in three different scales to meet various needs from edge computing to cloud computing. Gemini can be widely applied in creative design, writing assistance, question answering, code generation, and more.
AI Model
11.4M
Chinese Picks

Liblibai
LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.
AI Model
6.9M