Research code for long chain-of-thought reasoning in LLMs
Top 88.2% on sourcepulse
This repository provides code and experimental setups for investigating how Large Language Models (LLMs) learn and generate long Chain-of-Thought (CoT) reasoning. It targets researchers and practitioners aiming to improve LLM reasoning capabilities, particularly in complex domains like mathematics, by enabling longer, more structured reasoning processes.
How It Works
The project forks OpenRLHF, introducing modifications to support rule-based reward functions (e.g., Cosine Reward for length control) and multiple reward types with different discount factors for PPO and Reinforce++. It also integrates an "LLM-as-a-judge" component for reference-guided verification and includes MinHash for identifying reasoning patterns in pre-training data. This approach aims to systematically understand and replicate the long CoT generation observed in advanced models.
Quick Start & Requirements
sudo pip uninstall xgboost transformer_engine flash_attn -y
then pip install openrlhf
. For vLLM acceleration: pip install openrlhf[vllm]
or pip install openrlhf[vllm_latest]
. Alternatively, clone the repo and pip install -e .
.Highlighted Details
Maintenance & Community
The project is based on OpenRLHF and acknowledges contributions from various LLM projects. Further development is indicated by TODOs for action prompting code and additional run scripts.
Licensing & Compatibility
The repository is a fork of OpenRLHF, which is typically Apache 2.0 licensed. Specific licensing for the modifications is not explicitly stated in the README, but it builds upon Apache 2.0 licensed projects. Compatibility for commercial use is likely inherited from the base project, but verification is recommended.
Limitations & Caveats
The README notes that run scripts require minor fixes for file paths and API keys. Some dependencies may vary based on the environment. The project is presented alongside a research paper, suggesting it's primarily for experimental reproduction and further research rather than a production-ready library.
2 months ago
1 day