Skywork MoE Base FP8 : 146B parameter high-performance MoE model

Skywork MoE Base FP8

AI Model #MoE #Large Model #FP8 Precision #High-Performance Computing Standard Picks Open Source

Overview :

Skywork-MoE is a 146-billion parameter high-performance Mixture of Experts (MoE) model, featuring 16 experts and 2.2 billion activation parameters. This model is initialized from the dense checkpoint of the Skywork-13B model. Two innovative techniques are introduced: gated logic normalization, enhancing expert diversity; and adaptive auxiliary loss coefficient, allowing layer-specific auxiliary loss coefficient adjustment. Skywork-MoE demonstrates comparable or superior performance to models with more parameters or activation parameters across various popular benchmark tests, such as C-Eval, MMLU, CMMLU, GSM8K, MATH, and HumanEval.

Target Users :

The Skywork-MoE model is suitable for researchers and developers working on large-scale language model training and inference. It offers efficient parameter utilization and powerful computational performance, particularly beneficial in resource-constrained or scenarios requiring rapid inference.

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 46.4K

Use Cases

Researchers utilize Skywork-MoE for training and testing natural language processing task models.

Companies leverage the Skywork-MoE model for automatic product documentation generation and chatbot development.

Educational institutions adopt the Skywork-MoE model to assist in automatic generation of teaching materials and automated grading of student assignments.

Features

A large-scale MoE model with 146 billion parameters

16 experts and 2.2 billion activation parameters

Gated logic normalization technique

Adaptive auxiliary loss coefficient adjustment

Excellent performance in multiple benchmark tests

Supports FP8 precision operation, optimizing resource utilization

How to Use

Install necessary dependencies, including the corresponding version of PyTorch and vllm.

Clone the vllm codebase provided by Skywork and compile it.

Set up a Docker environment and run vllm directly using the Docker image provided by Skywork.

Configure the model path and working directory to begin using the Skywork MoE model for tasks such as text generation.