Rstar Math : Showcases research results demonstrating how small language models can master mathematical reasoning abilities through self-evolution and deep thinking.

Rstar Math

Model Training and Deployment Research Tools #Artificial Intelligence #Language Models #Mathematical Reasoning #Deep Learning #EdTech Standard Picks Open Source

Overview :

rStar-Math is a study aimed at demonstrating that small language models (SLMs) can match or even surpass the mathematical reasoning capabilities of OpenAI's o1 model without relying on more advanced models. This research employs Monte Carlo Tree Search (MCTS) to achieve 'deep thinking', allowing mathematical strategy SLMs to search based on a reward model guided by SLM. rStar-Math introduces three innovative approaches to address the challenge of training two SLMs, enhancing their mathematical reasoning abilities to a state-of-the-art level through four rounds of self-evolution and millions of synthetic solutions. The model significantly improved performance in the MATH benchmark tests and excelled in the AIME competition.

Target Users :

The target audience includes researchers, developers, and professionals from academia and industry who are interested in enhancing the mathematical reasoning capabilities of small language models. This model is suitable for scenarios that require efficient mathematical reasoning and problem-solving abilities, such as intelligent tutoring systems in the education sector and training tools for math competitions.

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 47.5K

Use Cases

In the MATH benchmark tests, improved the performance of Qwen2.5-Math-7B from 58.8% to 90.0%, and Phi3-mini-3.8B from 41.4% to 86.4%.

In the AIME competition, averaged solving 53.3% (8/15) of the problems, ranking among the top 20% of high school math competitors.

Continuously optimized the strategy model and process reward model through self-evolution, improving the ability to solve complex mathematical problems.

Features

Employ Monte Carlo Tree Search (MCTS) for deep thinking and search during testing.

Propose novel code-augmented Chain of Thought (CoT) data synthesis methods to generate and validate reasoning trajectories.

Develop new training methods for the reward model to avoid simplistic step-level scoring annotations.

Implement self-evolution recipes to iteratively build and enhance the strategy SLM and PPM from scratch, improving reasoning capabilities.

Exhibit outstanding performance in multiple mathematics benchmark tests, raising the mathematical reasoning level of small language models.

How to Use

1. Visit the rStar-Math page on Hugging Face to learn about the model details.

2. Review the research papers and related materials to understand the model's architecture and functioning.

3. Download and install the necessary dependencies and tools to prepare the runtime environment.

4. Use the provided code and data to load the pre-trained strategy SLM and PPM models.

5. For given mathematical problems, employ MCTS for reasoning and searching to obtain solutions.

6. Adjust model parameters and search strategies as needed to optimize performance.

7. Deploy the model in practical applications, such as educational software and online tutoring platforms, to provide users with mathematical reasoning support.