Logic-RL  by Unakar

LLM reasoning via rule-based reinforcement learning, research paper

created 6 months ago
2,383 stars

Top 19.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Logic-RL, a framework for enhancing Large Language Model (LLM) reasoning capabilities on logic puzzles through rule-based reinforcement learning. It targets researchers and developers aiming to improve LLM performance on complex, rule-bound tasks, offering a significant boost in accuracy compared to standard LLMs.

How It Works

Logic-RL integrates reinforcement learning with LLM inference, specifically using a rule-based reward system. This approach guides the LLM's reasoning process by rewarding adherence to logical rules and puzzle constraints, effectively steering its learning trajectory towards more accurate solutions. The framework leverages techniques from TinyZero and Verl for efficient training and deployment.

Quick Start & Requirements

  • Installation:
    • conda create -n logic python=3.9
    • pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
    • pip3 install vllm==0.6.3 ray
    • pip3 install flash-attn --no-build-isolation
    • pip install -e .
  • Prerequisites: Python 3.9, PyTorch 2.4.0 with CUDA 12.1, vLLM 0.6.3, Ray, FlashAttention.
  • Training: Requires bash main_grpo.sh and is noted to require 4x A100 80GB GPUs.
  • Resources: Training requires substantial GPU resources. Data preprocessing and model execution are detailed in the README.
  • Links: arXiv, Verl, TinyZero, K&K Puzzles

Highlighted Details

  • Achieves state-of-the-art performance on logic puzzles, outperforming models like GPT-4o and Deepseek-Math-7b.
  • Demonstrates strong results with a fine-tuned Qwen2.5-7B model, reaching 99% accuracy on 2-person puzzles.
  • Utilizes a rule-based reward modeling component for effective RL training.
  • Supports data preprocessing for both base and instruct models.

Maintenance & Community

The project is associated with authors from institutions like Tsinghua University. Further community engagement channels are not explicitly listed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of dependencies like PyTorch and vLLM suggests compatibility with common ML development environments.

Limitations & Caveats

The training process is resource-intensive, requiring multiple high-end GPUs (4x A100 80GB). The project appears to be recent, with primary results published in March 2025, indicating potential for ongoing development and changes.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
78 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.