Intuitor by sunblaze-ucb

LLM fine-tuning via internal confidence

Created 7 months ago

384 stars

Top 74.5% on SourcePulse

Project Summary

Intuitor is a reinforcement learning framework for fine-tuning large language models (LLMs) using self-certainty as the sole reward signal, eliminating the need for external labels or verifiers. It targets researchers and developers seeking to train LLMs in data-scarce or expensive environments, enabling scalable and domain-agnostic model improvement.

How It Works

Intuitor implements Reinforcement Learning from Internal Feedback (RLIF), a paradigm where LLMs optimize intrinsic signals like self-confidence. This confidence is measured using KL divergence to a uniform distribution, serving as the reward within the GRPO policy optimization algorithm. This approach allows for training without human feedback or verifiable supervision, making it suitable for diverse and challenging domains.

Quick Start & Requirements

Installation: Navigate to the open-r1-intuitor or verl-intuitor directory and follow variant-specific README instructions.
Training:
- open-r1-intuitor: bash run_intuitor.sh (requires WANDB_KEY).
- verl-intuitor: python examples/data_preprocess/math_dataset_ours.py --model Qwen2.5-3B followed by bash math_intuitor.sh (requires WANDB_KEY and MATH dataset).
Dependencies: Python, Hugging Face Open-R1, VERL library, Weights & Biases (WANDB) for tracking. Specific model checkpoints are available on Hugging Face.
Resources: Training requires significant computational resources, typical for LLM fine-tuning.

Highlighted Details

Offers four model checkpoints (1.5B to 14B parameters) trained on the MATH dataset.
Achieves comparable performance to GRPO on math reasoning tasks (GSM8K, MATH500).
Demonstrates superior generalization to code generation (LiveCodeBench, CRUXEval).
Improves instruction following without gold labels.

Maintenance & Community

The project is associated with UCB and appears to be actively maintained, with recent model releases in June 2025. Links to model collections and Hugging Face repositories are provided.

Licensing & Compatibility

License: Apache License 2.0 for both open-r1-intuitor and verl-intuitor implementations.
Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

Performance can be sensitive to prompt design; alternative prompts may be necessary for effective learning. The framework is primarily demonstrated on math and code generation tasks.

Intuitor by sunblaze-ucb

Explore Similar Projects

Minimal-RL by RLHFlow

DFT by yongliang-wu

Spurious_Rewards by ruixin31

tiny-grpo by open-thought

machina by DeepX-inc

LUFFY by ElliottYan

M_GRPO by baibizhe

DAPO by BytedTsinghua-SIA

GRPO-Zero by policy-gradient

EasyR1 by hiyouga

catalyst by catalyst-team

HEBO by huawei-noah