Skywork-MoE
S
Skywork MoE
Overview :
Skywork-MoE is a high-performance Mixture of Experts (MoE) model with 14.6 billion parameters, consisting of 16 experts and 2.2 billion activation parameters. This model is initialized from the dense checkpoint of the Skywork-13B model and incorporates two innovative techniques: gated logit normalization to enhance expert diversification, and adaptive auxiliary loss coefficients allowing for layer-specific auxiliary loss coefficient adjustment. Skywork-MoE demonstrates comparable or better performance than models with more parameters or activation parameters, such as Grok-1, DBRX, Mistral 8*22, and Deepseek-V2.
Target Users :
The Skywork-MoE model is suitable for researchers and developers who need to handle large-scale language model training and inference. Its high parameter count and expert diversification techniques enable it to perform exceptionally well in complex language tasks, while the ability to adaptively adjust auxiliary loss coefficients allows for optimization of the model for specific layers, enhancing model performance and efficiency.
Total Visits: 474.6M
Top Region: US(19.34%)
Website Views : 53.3K
Use Cases
Evaluations on popular benchmark datasets like C-Eval, MMLU, and CMMLU
Inference examples using Skywork-MoE-Base model with HuggingFace
Example of fast deployment of Skywork-MoE-Base model based on vLLM
Features
Gating Logit Normalization technique, enhancing expert diversification
Adaptive Auxiliary Loss Coefficients technique, allowing for layer-specific auxiliary loss coefficient adjustment
Compatibility with platforms like Hugging Face, ModelScope, and Wisemodel
Support for inference on 8xA100/A800 or higher GPU hardware configurations
Provides a fast deployment method for vLLM model inference
Supports fp8 precision, enabling Skywork-MoE-Base model execution on 8*4090
Offers detailed technical documentation and a community license agreement
How to Use
Install necessary dependencies, including pytorch-nightly version and vllm-flash-attn
Clone the Skywork-provided vllm source code
Configure and build vllm according to your local environment
Run vllm using docker, setting the model path and working directory
Perform text generation using the LLM class and SamplingParams class within vllm
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase