Discover and explore top open-source AI tools and projects—updated daily.
LLM reasoning enhanced with RL entropy control
Top 83.4% on SourcePulse
This repository addresses the "entropy collapse" issue in reinforcement learning (RL) for large language models (LLMs), which hinders reasoning capabilities by causing overconfidence and performance saturation. It targets researchers and practitioners working with LLMs and RL, offering methods to improve model exploration and performance.
How It Works
The project identifies a negative exponential relationship between policy entropy and performance, suggesting entropy exhaustion bottlenecks LLM reasoning. It theoretically links entropy decline to the covariance between action probabilities and logit updates, which is typically positive and drives entropy reduction. To counter this, the proposed Clip-Cov and KL-Cov methods restrict updates for high-covariance tokens, effectively preventing entropy collapse and enhancing performance.
Quick Start & Requirements
conda env create -n entropy -f environment.yaml
bash recipe/dapo/7b_kl_cov.sh
for Qwen2.5-7B on a single node.bash recipe/dapo/32b_kl_cov.sh
for Qwen2.5-32B.Highlighted Details
verl
and built on the dapo
recipe.Maintenance & Community
verl
.Licensing & Compatibility
Limitations & Caveats
The project is based on research and may be in an alpha or experimental stage. Specific dataset configurations ("data_source") are hardcoded for certain training scripts, requiring user modification for different datasets. Multi-node training setup might require specific environment variable configurations.
2 months ago
Inactive