d1
D
D1
Overview :
This model improves the reasoning capabilities of diffusion large language models through reinforcement learning and masked self-supervised fine-tuning with high-quality reasoning trajectories. The importance of this technology lies in its ability to optimize the model's reasoning process, reduce computational costs, while ensuring the stability of learning dynamics. Suitable for users who want to improve efficiency in writing and reasoning tasks.
Target Users :
Suitable for researchers and developers who want to leverage reinforcement learning to optimize the reasoning capabilities of language models and improve application efficiency.
Total Visits: 882
Website Views : 42.0K
Use Cases
Use this model to improve the reasoning ability of chatbots on complex problems.
In educational applications, help students solve logical reasoning problems.
Provide intelligent writing assistance for content creators, improving creative efficiency.
Features
High-quality reasoning trajectories: Fine-tuned using a curated set of 1000 reasoning problems.
Effective policy gradient algorithm: Introduces diffu-GRPO to adapt to masked diffusion large language models.
Log probability estimation: Employs a mean-field approximation method, providing efficient log probability estimation.
Stochastic masking: Creates perturbed views, enhancing the regularization effect of policy optimization.
Stable learning dynamics: Increases the number of inner updates, reducing the need for external batch iterations.
How to Use
Download and install the model software.
Prepare a high-quality dataset of reasoning problems.
Perform masked self-supervised fine-tuning.
Apply diffu-GRPO for policy optimization.
Evaluate the model's performance in practical applications and make adjustments.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase