ReFT
R
Reft
Overview :
ReFT is a simple yet effective method for enhancing the reasoning capabilities of large language models (LLMs). It first preheats the model through supervised fine-tuning (SFT), and then further fine-tunes the model using online reinforcement learning, specifically the PPO algorithm presented in this paper. ReFT significantly outperforms SFT by automatically sampling a large number of reasoning paths for a given problem and naturally deriving rewards from the true answers. ReFT's performance can be further improved by combining reasoning strategies (such as majority voting and re-ranking). It's noteworthy that ReFT achieves improvements by learning from the same training questions as SFT, without relying on additional or enhanced training questions. This demonstrates ReFT's stronger generalization ability.
Target Users :
Used to enhance the reasoning capabilities of large language models, especially in areas like solving mathematical problems.
Total Visits: 29.7M
Top Region: US(17.94%)
Website Views : 52.4K
Features
Supervised Fine-tuning (SFT)
Online Reinforcement Learning
PPO Algorithm
Reasoning Path Sampling
Performance Optimization Strategies
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase