dLLM-RL  by Gen-Verse

Revolutionary RL framework for diffusion language models

Created 2 months ago
295 stars

Top 89.7% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Gen-Verse/dLLM-RL introduces TraceRL and TraDo-8B, a novel reinforcement learning framework and a suite of diffusion language models (DLMs) designed to advance RL capabilities for DLMs. Targeting researchers and practitioners, this project offers state-of-the-art performance in complex reasoning tasks like mathematics and coding, aiming to revolutionize how DLMs are trained and optimized for generative tasks.

How It Works

The framework's core is TraceRL, a trajectory-aware reinforcement learning method that incorporates a diffusion-based value model. This combination significantly reduces variance and improves stability during optimization, a key challenge in DLM training. Based on TraceRL, the TraDo model series (e.g., TraDo-4B-Instruct, TraDo-8B-Instruct, TraDo-8B-Thinking) achieves state-of-the-art results on math and coding reasoning benchmarks, directly challenging traditional autoregressive models with its diffusion-based approach.

Quick Start & Requirements

  • Installation: Setup involves Conda for environment management (conda create --name dllm-rl python=3.10, source activate dllm-rl), followed by pip installations for PyTorch (torch==2.6.0) and a specific FlashAttention wheel (flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl), and then pip install -r requirements.txt.
  • Prerequisites: Python 3.10, PyTorch 2.6.0, and CUDA 12 are required. Multi-node setups may have additional hardware dependencies.
  • Data: Datasets can be downloaded using python download_data.py --dataset <dataset_name> (e.g., MATH500).
  • Configuration: Users must select or create configuration files in the ./configs directory.
  • Links: No direct links to external documentation, demos, or quick-start guides are provided within the provided documentation.

Highlighted Details

  • Broad model support: Compatible with diverse DLM architectures including full attention, adapted, and block attention models (e.g., TraDo, SDAR, Dream, LLaDA, MMaDA, Diffu-Coder).
  • Inference acceleration: Features include optimized KV-cache, jetengine (based on nano-vllm), various sampling strategies, and robust multi-node inference capabilities.
  • Advanced RL training: Implements TraceRL (with optional diffusion value model), coupled RL, and random masking RL, all benefiting from KV-cache acceleration.
  • Flexible SFT: Supports Block SFT, semi-AR SFT, and random masking SFT, with multi-node and long-CoT finetuning options.

Maintenance & Community

Information regarding community channels (e.g., Discord, Slack), project roadmaps, or notable contributors is not present in the provided documentation.

Licensing & Compatibility

The software license and details on compatibility for commercial use or integration with closed-source projects are not specified.

Limitations & Caveats

The provided documentation does not explicitly detail any limitations, known bugs, alpha status, or unsupported platforms.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
27
Star History
63 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Huber Jeff Huber(Cofounder of Chroma), Omar Khattab Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and
1 more.

arbor by Ziems

26.8%
264
Framework for optimizing DSPy programs with RL
Created 8 months ago
Updated 1 day ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

0.5%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 3 weeks ago
Feedback? Help us improve.