Tina  by shangshang-wang

LoRA reasoning models

Created 5 months ago
282 stars

Top 92.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides code for Tina, a project focused on enhancing reasoning abilities in small language models using Low-Rank Adaptation (LoRA) during reinforcement learning. It targets researchers and practitioners interested in cost-effective methods for improving LLM reasoning, demonstrating competitive performance with significantly reduced computational costs.

How It Works

Tina employs LoRA during reinforcement learning (specifically GRPO) to fine-tune small language models. This approach injects reasoning capabilities by adapting only a small subset of model parameters, leading to substantial savings in training cost and time compared to full-parameter fine-tuning. The method is validated on challenging reasoning benchmarks like AIME24, showing significant performance gains.

Quick Start & Requirements

  • Installation: Requires conda for environment setup. Two environments (tina and tina_eval) are created using mamba.
  • Prerequisites: Python 3.10 and 3.11, conda, mamba. Pre-trained models need to be downloaded to a specified CKPT_DIR.
  • Setup: Environment setup involves running shell scripts (set_vars.sh, set_env.sh, set_env_eval.sh).
  • Resources: Reproducing experiments costs ~$526, with the best checkpoint costing ~$9.
  • Links: Notion, Hugging Face Collection, Weights and Biases.

Highlighted Details

  • Achieves >20% performance increase and 43.33% Pass@1 accuracy on AIME24.
  • Demonstrates cost-effective training with LoRA, costing only $9 for the best checkpoint.
  • Leverages the open-r1 project as its codebase foundation.
  • Utilizes multiple open-source reasoning datasets for training.

Maintenance & Community

The project is associated with Shangshang Wang and researchers from USC. It cites multiple open-source datasets and the open-r1 project.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license, which is crucial for determining commercial usability. The setup involves multiple environment and path configurations, potentially requiring careful adaptation to user environments.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.1%
4k
RL recipe for reasoning ability in models
Created 7 months ago
Updated 1 month ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
19 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
Created 8 months ago
Updated 2 months ago
Feedback? Help us improve.