demystify-long-cot  by eddycmu

Research code for long chain-of-thought reasoning in LLMs

Created 7 months ago
316 stars

Top 85.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code and experimental setups for investigating how Large Language Models (LLMs) learn and generate long Chain-of-Thought (CoT) reasoning. It targets researchers and practitioners aiming to improve LLM reasoning capabilities, particularly in complex domains like mathematics, by enabling longer, more structured reasoning processes.

How It Works

The project forks OpenRLHF, introducing modifications to support rule-based reward functions (e.g., Cosine Reward for length control) and multiple reward types with different discount factors for PPO and Reinforce++. It also integrates an "LLM-as-a-judge" component for reference-guided verification and includes MinHash for identifying reasoning patterns in pre-training data. This approach aims to systematically understand and replicate the long CoT generation observed in advanced models.

Quick Start & Requirements

  • Installation: Recommended via Docker. Inside the container: sudo pip uninstall xgboost transformer_engine flash_attn -y then pip install openrlhf. For vLLM acceleration: pip install openrlhf[vllm] or pip install openrlhf[vllm_latest]. Alternatively, clone the repo and pip install -e ..
  • Prerequisites: NVIDIA GPU with CUDA, Docker, Python. vLLM 0.6.4 or higher is recommended.
  • Resources: Requires setting up an OpenRLHF environment. Specific resource needs depend on the experiments run.
  • Links: OpenRLHF Docs (base project), Paper

Highlighted Details

  • Implements rule-based reward functions for stabilizing and controlling CoT length.
  • Integrates "LLM-as-a-judge" as a verifier compatible with rule-based rewards.
  • Includes MinHash for searching pre-training data for long CoT reasoning patterns.
  • Supports multiple reward types with different discount factors for PPO and Reinforce++.

Maintenance & Community

The project is based on OpenRLHF and acknowledges contributions from various LLM projects. Further development is indicated by TODOs for action prompting code and additional run scripts.

Licensing & Compatibility

The repository is a fork of OpenRLHF, which is typically Apache 2.0 licensed. Specific licensing for the modifications is not explicitly stated in the README, but it builds upon Apache 2.0 licensed projects. Compatibility for commercial use is likely inherited from the base project, but verification is recommended.

Limitations & Caveats

The README notes that run scripts require minor fixes for file paths and API keys. Some dependencies may vary based on the environment. The project is presented alongside a research paper, suggesting it's primarily for experimental reproduction and further research rather than a production-ready library.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
6 more.

awesome-o1 by srush

0%
1k
Bibliography for OpenAI's o1 project
Created 11 months ago
Updated 10 months ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

self-rewarding-lm-pytorch by lucidrains

0.1%
1k
Training framework for self-rewarding language models
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab).

Eureka by eureka-research

0.2%
3k
LLM-based reward design for reinforcement learning
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.