demystify-long-cot  by eddycmu

Research code for long chain-of-thought reasoning in LLMs

created 6 months ago
308 stars

Top 88.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides code and experimental setups for investigating how Large Language Models (LLMs) learn and generate long Chain-of-Thought (CoT) reasoning. It targets researchers and practitioners aiming to improve LLM reasoning capabilities, particularly in complex domains like mathematics, by enabling longer, more structured reasoning processes.

How It Works

The project forks OpenRLHF, introducing modifications to support rule-based reward functions (e.g., Cosine Reward for length control) and multiple reward types with different discount factors for PPO and Reinforce++. It also integrates an "LLM-as-a-judge" component for reference-guided verification and includes MinHash for identifying reasoning patterns in pre-training data. This approach aims to systematically understand and replicate the long CoT generation observed in advanced models.

Quick Start & Requirements

  • Installation: Recommended via Docker. Inside the container: sudo pip uninstall xgboost transformer_engine flash_attn -y then pip install openrlhf. For vLLM acceleration: pip install openrlhf[vllm] or pip install openrlhf[vllm_latest]. Alternatively, clone the repo and pip install -e ..
  • Prerequisites: NVIDIA GPU with CUDA, Docker, Python. vLLM 0.6.4 or higher is recommended.
  • Resources: Requires setting up an OpenRLHF environment. Specific resource needs depend on the experiments run.
  • Links: OpenRLHF Docs (base project), Paper

Highlighted Details

  • Implements rule-based reward functions for stabilizing and controlling CoT length.
  • Integrates "LLM-as-a-judge" as a verifier compatible with rule-based rewards.
  • Includes MinHash for searching pre-training data for long CoT reasoning patterns.
  • Supports multiple reward types with different discount factors for PPO and Reinforce++.

Maintenance & Community

The project is based on OpenRLHF and acknowledges contributions from various LLM projects. Further development is indicated by TODOs for action prompting code and additional run scripts.

Licensing & Compatibility

The repository is a fork of OpenRLHF, which is typically Apache 2.0 licensed. Specific licensing for the modifications is not explicitly stated in the README, but it builds upon Apache 2.0 licensed projects. Compatibility for commercial use is likely inherited from the base project, but verification is recommended.

Limitations & Caveats

The README notes that run scripts require minor fixes for file paths and API keys. Some dependencies may vary based on the environment. The project is presented alongside a research paper, suggesting it's primarily for experimental reproduction and further research rather than a production-ready library.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 18 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.