LLM fine-tuning via internal confidence
Top 82.8% on sourcepulse
Intuitor is a reinforcement learning framework for fine-tuning large language models (LLMs) using self-certainty as the sole reward signal, eliminating the need for external labels or verifiers. It targets researchers and developers seeking to train LLMs in data-scarce or expensive environments, enabling scalable and domain-agnostic model improvement.
How It Works
Intuitor implements Reinforcement Learning from Internal Feedback (RLIF), a paradigm where LLMs optimize intrinsic signals like self-confidence. This confidence is measured using KL divergence to a uniform distribution, serving as the reward within the GRPO policy optimization algorithm. This approach allows for training without human feedback or verifiable supervision, making it suitable for diverse and challenging domains.
Quick Start & Requirements
open-r1-intuitor
or verl-intuitor
directory and follow variant-specific README instructions.open-r1-intuitor
: bash run_intuitor.sh
(requires WANDB_KEY).verl-intuitor
: python examples/data_preprocess/math_dataset_ours.py --model Qwen2.5-3B
followed by bash math_intuitor.sh
(requires WANDB_KEY and MATH dataset).Highlighted Details
Maintenance & Community
The project is associated with UCB and appears to be actively maintained, with recent model releases in June 2025. Links to model collections and Hugging Face repositories are provided.
Licensing & Compatibility
open-r1-intuitor
and verl-intuitor
implementations.Limitations & Caveats
Performance can be sensitive to prompt design; alternative prompts may be necessary for effective learning. The framework is primarily demonstrated on math and code generation tasks.
3 weeks ago
Inactive