Discover and explore top open-source AI tools and projects—updated daily.
Gen-VerseRevolutionary RL framework for diffusion language models
Top 89.7% on SourcePulse
Summary
Gen-Verse/dLLM-RL introduces TraceRL and TraDo-8B, a novel reinforcement learning framework and a suite of diffusion language models (DLMs) designed to advance RL capabilities for DLMs. Targeting researchers and practitioners, this project offers state-of-the-art performance in complex reasoning tasks like mathematics and coding, aiming to revolutionize how DLMs are trained and optimized for generative tasks.
How It Works
The framework's core is TraceRL, a trajectory-aware reinforcement learning method that incorporates a diffusion-based value model. This combination significantly reduces variance and improves stability during optimization, a key challenge in DLM training. Based on TraceRL, the TraDo model series (e.g., TraDo-4B-Instruct, TraDo-8B-Instruct, TraDo-8B-Thinking) achieves state-of-the-art results on math and coding reasoning benchmarks, directly challenging traditional autoregressive models with its diffusion-based approach.
Quick Start & Requirements
conda create --name dllm-rl python=3.10, source activate dllm-rl), followed by pip installations for PyTorch (torch==2.6.0) and a specific FlashAttention wheel (flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl), and then pip install -r requirements.txt.python download_data.py --dataset <dataset_name> (e.g., MATH500)../configs directory.Highlighted Details
Maintenance & Community
Information regarding community channels (e.g., Discord, Slack), project roadmaps, or notable contributors is not present in the provided documentation.
Licensing & Compatibility
The software license and details on compatibility for commercial use or integration with closed-source projects are not specified.
Limitations & Caveats
The provided documentation does not explicitly detail any limitations, known bugs, alpha status, or unsupported platforms.
2 weeks ago
Inactive
Ziems
NVIDIA-NeMo
SafeAILab