QeRL  by NVlabs

Efficient RL for large language models on single GPUs

Created 1 month ago
390 stars

Top 73.5% on SourcePulse

GitHubView on GitHub
Project Summary

Quantization-enhanced Reinforcement Learning (QeRL) addresses the significant computational demands of applying reinforcement learning to large language models (LLMs). It enables the training of up to 32B parameter LLMs on a single H100 GPU, offering a low-cost and efficient alternative for researchers and engineers. QeRL accelerates RL training, improves exploration, and achieves performance comparable to full-parameter fine-tuning.

How It Works

QeRL integrates NVFP4 quantization with Low-Rank Adaptation (LoRA) to drastically reduce memory overhead and speed up the rollout phase of RL training. A key insight is that quantization noise inherently increases policy entropy, which enhances exploration during RL training, leading to the discovery of better strategies. The framework further optimizes this with an Adaptive Quantization Noise (AQN) mechanism that dynamically adjusts noise levels. This approach yields over 1.5x speedup in rollouts and enables training of larger models on constrained hardware.

Quick Start & Requirements

  • Primary install/run command: Clone the repository, create and activate a conda environment (conda create -n qerl python=3.10 -y, conda activate qerl), install CUDA (conda install nvidia/label/cuda-12.4.1::cuda, conda install -c nvidia/label/cuda-12.4.1 cudatoolkit), and run sh setup_env.sh. A separate environment for quantization (llmcompressor) requires Python 3.12.
  • Prerequisites: Nvidia GPU supporting NVFP4 (e.g., RTX 5090, H100, B100), Linux OS, 64 GB RAM, CUDA >= 12.4.1.
  • Links: GitHub Repository

Highlighted Details

  • Enables RL training of a 32B LLM on a single H100 80GB GPU.
  • Achieves over 1.5x speedup in the RL rollout phase.
  • Outperforms vanilla LoRA and QLoRA in reward growth and final accuracy.
  • Matches full-parameter fine-tuning performance on GSM8K (90.8%) and MATH 500 (77.4%) for 7B models.

Maintenance & Community

No specific details on maintainers, community channels (like Discord/Slack), or roadmaps were found in the provided README.

Licensing & Compatibility

The QeRL code is released under the Apache 2.0 License, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The README notes that hardware setups other than the tested ones might work but have not been verified. Additionally, prefill logits computation currently requires dequantization, which is identified as an area for future optimization.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
8
Star History
403 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Zack Li Zack Li(Cofounder of Nexa AI), and
4 more.

smoothquant by mit-han-lab

0.4%
2k
Post-training quantization research paper for large language models
Created 3 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
2 more.

SimpleTuner by bghira

0.3%
3k
Fine-tuning kit for diffusion models
Created 2 years ago
Updated 14 hours ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
40 more.

unsloth by unslothai

0.6%
48k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 10 hours ago
Feedback? Help us improve.