QeRL by NVlabs

Efficient RL for large language models on single GPUs

Created 3 months ago

469 stars

Top 64.8% on SourcePulse

Project Summary

Quantization-enhanced Reinforcement Learning (QeRL) addresses the significant computational demands of applying reinforcement learning to large language models (LLMs). It enables the training of up to 32B parameter LLMs on a single H100 GPU, offering a low-cost and efficient alternative for researchers and engineers. QeRL accelerates RL training, improves exploration, and achieves performance comparable to full-parameter fine-tuning.

How It Works

QeRL integrates NVFP4 quantization with Low-Rank Adaptation (LoRA) to drastically reduce memory overhead and speed up the rollout phase of RL training. A key insight is that quantization noise inherently increases policy entropy, which enhances exploration during RL training, leading to the discovery of better strategies. The framework further optimizes this with an Adaptive Quantization Noise (AQN) mechanism that dynamically adjusts noise levels. This approach yields over 1.5x speedup in rollouts and enables training of larger models on constrained hardware.

Quick Start & Requirements

Primary install/run command: Clone the repository, create and activate a conda environment (conda create -n qerl python=3.10 -y, conda activate qerl), install CUDA (conda install nvidia/label/cuda-12.4.1::cuda, conda install -c nvidia/label/cuda-12.4.1 cudatoolkit), and run sh setup_env.sh. A separate environment for quantization (llmcompressor) requires Python 3.12.
Prerequisites: Nvidia GPU supporting NVFP4 (e.g., RTX 5090, H100, B100), Linux OS, 64 GB RAM, CUDA >= 12.4.1.
Links: GitHub Repository

Highlighted Details

Enables RL training of a 32B LLM on a single H100 80GB GPU.
Achieves over 1.5x speedup in the RL rollout phase.
Outperforms vanilla LoRA and QLoRA in reward growth and final accuracy.
Matches full-parameter fine-tuning performance on GSM8K (90.8%) and MATH 500 (77.4%) for 7B models.

Maintenance & Community

No specific details on maintainers, community channels (like Discord/Slack), or roadmaps were found in the provided README.

Licensing & Compatibility

The QeRL code is released under the Apache 2.0 License, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The README notes that hardware setups other than the tested ones might work but have not been verified. Additionally, prefill logits computation currently requires dequantization, which is identified as an area for future optimization.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days