Intuitor  by sunblaze-ucb

LLM fine-tuning via internal confidence

created 2 months ago
337 stars

Top 82.8% on sourcepulse

GitHubView on GitHub
Project Summary

Intuitor is a reinforcement learning framework for fine-tuning large language models (LLMs) using self-certainty as the sole reward signal, eliminating the need for external labels or verifiers. It targets researchers and developers seeking to train LLMs in data-scarce or expensive environments, enabling scalable and domain-agnostic model improvement.

How It Works

Intuitor implements Reinforcement Learning from Internal Feedback (RLIF), a paradigm where LLMs optimize intrinsic signals like self-confidence. This confidence is measured using KL divergence to a uniform distribution, serving as the reward within the GRPO policy optimization algorithm. This approach allows for training without human feedback or verifiable supervision, making it suitable for diverse and challenging domains.

Quick Start & Requirements

  • Installation: Navigate to the open-r1-intuitor or verl-intuitor directory and follow variant-specific README instructions.
  • Training:
    • open-r1-intuitor: bash run_intuitor.sh (requires WANDB_KEY).
    • verl-intuitor: python examples/data_preprocess/math_dataset_ours.py --model Qwen2.5-3B followed by bash math_intuitor.sh (requires WANDB_KEY and MATH dataset).
  • Dependencies: Python, Hugging Face Open-R1, VERL library, Weights & Biases (WANDB) for tracking. Specific model checkpoints are available on Hugging Face.
  • Resources: Training requires significant computational resources, typical for LLM fine-tuning.

Highlighted Details

  • Offers four model checkpoints (1.5B to 14B parameters) trained on the MATH dataset.
  • Achieves comparable performance to GRPO on math reasoning tasks (GSM8K, MATH500).
  • Demonstrates superior generalization to code generation (LiveCodeBench, CRUXEval).
  • Improves instruction following without gold labels.

Maintenance & Community

The project is associated with UCB and appears to be actively maintained, with recent model releases in June 2025. Links to model collections and Hugging Face repositories are provided.

Licensing & Compatibility

  • License: Apache License 2.0 for both open-r1-intuitor and verl-intuitor implementations.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

Performance can be sensitive to prompt design; alternative prompts may be necessary for effective learning. The framework is primarily demonstrated on math and code generation tasks.

Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
342 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.