Skywork MoE Base : A high-performance mixed expert (MoE) model with 146 billion parameters

Skywork MoE Base

AI Model AI Model Training #Mixed expert model #Large scale parameters #Text generation #Hugging Face #vLLM Standard Picks Open Source

Overview :

Skywork-MoE-Base is a high-performance mixed expert (MoE) model with 146 billion parameters, comprising 16 experts and activating 22 billion parameters. The model is initialized from the dense checkpoint of the Skywork-13B model and introduces two innovative techniques: gated logical normalization enhances expert diversity, and an adaptive auxiliary loss coefficient allows for layer-specific adjustment of the auxiliary loss coefficient. Skywork-MoE exhibits comparable or superior performance to models with more parameters or activation parameters on various popular benchmark tests.

Target Users :

The Skywork-MoE-Base model is designed for developers and researchers who need to handle large-scale language model inference. Its high performance and innovative technology make it an ideal choice for complex text generation and analysis tasks.

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 52.2K

Use Cases

Generate detailed descriptions of the capitals of Chinese provinces

Conduct multi-turn dialogue generation, such as asking consecutive questions about provincial capitals

Rapidly deploy for research and development of new language model applications

Features

A large-scale mixed expert model with 146 billion parameters

16 experts and 22 billion activated parameters

Introduces two innovative techniques: gated logical normalization and adaptive auxiliary loss coefficient

Outperforms on multiple benchmark tests