Yuan2.0 M32 Hf Int8 : High-Performance Mixture of Experts Language Model

Yuan2.0 M32 Hf Int8

AI Model AI Language Model #Mixture of Experts Model #Attention Router #High Performance #Programming #Mathematics Standard Picks Open Source

Overview :

Yuan2.0-M32-hf-int8 is a mixture of experts (MoE) language model featuring 32 experts, of which 2 are active. By adopting a new routing network—the attention router—it enhances the efficiency of expert selection, resulting in an accuracy improvement of 3.8% compared to models using traditional routing networks. Yuan2.0-M32 was trained from scratch on 200 billion tokens, with its training computation demand being just 9.25% of that required by a dense model of equivalent parameter size. This model is competitive in programming, mathematics, and various specialized fields while utilizing only 3.7 billion active parameters, which is a small portion of a total of 4 billion parameters. The forward computation per token requires only 7.4 GFLOPS, just 1/19th of what Llama3-70B demands. Yuan2.0-M32 outperformed Llama3-70B in the MATH and ARC-Challenge benchmark tests, achieving accuracy rates of 55.9% and 95.8%, respectively.

Target Users :

The Yuan2.0-M32-hf-int8 model is designed for developers and researchers who need to handle large volumes of data and complex tasks, particularly in programming, mathematics, and specialized fields. Its high efficiency and accuracy make it an ideal choice in these areas.

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 51.9K

Use Cases

Utilized for developing complex programming projects, enhancing code generation accuracy.

Provides precise calculations and reasoning in solving mathematical problems.

Applies to knowledge acquisition and text generation in professional fields.

Features

Only 2 out of 32 experts are active, enhancing efficiency.

Utilizes attention routers to improve accuracy by 3.8%.

Trained from scratch using 200 billion tokens.

Training computation cost is just 9.25% of an equally sized dense model.