Tina  by shangshang-wang

LoRA reasoning models

created 3 months ago
273 stars

Top 95.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for Tina, a project focused on enhancing reasoning abilities in small language models using Low-Rank Adaptation (LoRA) during reinforcement learning. It targets researchers and practitioners interested in cost-effective methods for improving LLM reasoning, demonstrating competitive performance with significantly reduced computational costs.

How It Works

Tina employs LoRA during reinforcement learning (specifically GRPO) to fine-tune small language models. This approach injects reasoning capabilities by adapting only a small subset of model parameters, leading to substantial savings in training cost and time compared to full-parameter fine-tuning. The method is validated on challenging reasoning benchmarks like AIME24, showing significant performance gains.

Quick Start & Requirements

  • Installation: Requires conda for environment setup. Two environments (tina and tina_eval) are created using mamba.
  • Prerequisites: Python 3.10 and 3.11, conda, mamba. Pre-trained models need to be downloaded to a specified CKPT_DIR.
  • Setup: Environment setup involves running shell scripts (set_vars.sh, set_env.sh, set_env_eval.sh).
  • Resources: Reproducing experiments costs ~$526, with the best checkpoint costing ~$9.
  • Links: Notion, Hugging Face Collection, Weights and Biases.

Highlighted Details

  • Achieves >20% performance increase and 43.33% Pass@1 accuracy on AIME24.
  • Demonstrates cost-effective training with LoRA, costing only $9 for the best checkpoint.
  • Leverages the open-r1 project as its codebase foundation.
  • Utilizes multiple open-source reasoning datasets for training.

Maintenance & Community

The project is associated with Shangshang Wang and researchers from USC. It cites multiple open-source datasets and the open-r1 project.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license, which is crucial for determining commercial usability. The setup involves multiple environment and path configurations, potentially requiring careful adaptation to user environments.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
112 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 4 days ago
Feedback? Help us improve.