Skywork MoE : A high-performance MoE model with 14.6 billion parameters

Skywork MoE

AI Model AI Model Inference Training #MoE model #Large language model #Gated logit normalization #Adaptive auxiliary loss coefficient Fresh Picks Open Source

Overview :

Skywork-MoE is a high-performance Mixture of Experts (MoE) model with 14.6 billion parameters, consisting of 16 experts and 2.2 billion activation parameters. This model is initialized from the dense checkpoint of the Skywork-13B model and incorporates two innovative techniques: gated logit normalization to enhance expert diversification, and adaptive auxiliary loss coefficients allowing for layer-specific auxiliary loss coefficient adjustment. Skywork-MoE demonstrates comparable or better performance than models with more parameters or activation parameters, such as Grok-1, DBRX, Mistral 8*22, and Deepseek-V2.

Target Users :

The Skywork-MoE model is suitable for researchers and developers who need to handle large-scale language model training and inference. Its high parameter count and expert diversification techniques enable it to perform exceptionally well in complex language tasks, while the ability to adaptively adjust auxiliary loss coefficients allows for optimization of the model for specific layers, enhancing model performance and efficiency.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 53.5K

Use Cases

Evaluations on popular benchmark datasets like C-Eval, MMLU, and CMMLU

Inference examples using Skywork-MoE-Base model with HuggingFace

Example of fast deployment of Skywork-MoE-Base model based on vLLM

Features

Gating Logit Normalization technique, enhancing expert diversification

Adaptive Auxiliary Loss Coefficients technique, allowing for layer-specific auxiliary loss coefficient adjustment

Compatibility with platforms like Hugging Face, ModelScope, and Wisemodel

Support for inference on 8xA100/A800 or higher GPU hardware configurations