Logic-RL by Unakar

LLM reasoning via rule-based reinforcement learning, research paper

Created 11 months ago

2,431 stars

Top 18.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

This repository provides Logic-RL, a framework for enhancing Large Language Model (LLM) reasoning capabilities on logic puzzles through rule-based reinforcement learning. It targets researchers and developers aiming to improve LLM performance on complex, rule-bound tasks, offering a significant boost in accuracy compared to standard LLMs.

How It Works

Logic-RL integrates reinforcement learning with LLM inference, specifically using a rule-based reward system. This approach guides the LLM's reasoning process by rewarding adherence to logical rules and puzzle constraints, effectively steering its learning trajectory towards more accurate solutions. The framework leverages techniques from TinyZero and Verl for efficient training and deployment.

Quick Start & Requirements

Installation:
- conda create -n logic python=3.9
- pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
- pip3 install vllm==0.6.3 ray
- pip3 install flash-attn --no-build-isolation
- pip install -e .
Prerequisites: Python 3.9, PyTorch 2.4.0 with CUDA 12.1, vLLM 0.6.3, Ray, FlashAttention.
Training: Requires bash main_grpo.sh and is noted to require 4x A100 80GB GPUs.
Resources: Training requires substantial GPU resources. Data preprocessing and model execution are detailed in the README.
Links: arXiv, Verl, TinyZero, K&K Puzzles

Highlighted Details

Achieves state-of-the-art performance on logic puzzles, outperforming models like GPT-4o and Deepseek-Math-7b.
Demonstrates strong results with a fine-tuned Qwen2.5-7B model, reaching 99% accuracy on 2-person puzzles.
Utilizes a rule-based reward modeling component for effective RL training.
Supports data preprocessing for both base and instruct models.

Maintenance & Community

The project is associated with authors from institutions like Tsinghua University. Further community engagement channels are not explicitly listed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of dependencies like PyTorch and vLLM suggests compatibility with common ML development environments.

Limitations & Caveats

The training process is resource-intensive, requiring multiple high-end GPUs (4x A100 80GB). The project appears to be recent, with primary results published in March 2025, indicating potential for ongoing development and changes.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days