Tina by shangshang-wang

LoRA reasoning models

Created 9 months ago

313 stars

Top 86.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

This repository provides code for Tina, a project focused on enhancing reasoning abilities in small language models using Low-Rank Adaptation (LoRA) during reinforcement learning. It targets researchers and practitioners interested in cost-effective methods for improving LLM reasoning, demonstrating competitive performance with significantly reduced computational costs.

How It Works

Tina employs LoRA during reinforcement learning (specifically GRPO) to fine-tune small language models. This approach injects reasoning capabilities by adapting only a small subset of model parameters, leading to substantial savings in training cost and time compared to full-parameter fine-tuning. The method is validated on challenging reasoning benchmarks like AIME24, showing significant performance gains.

Quick Start & Requirements

Installation: Requires conda for environment setup. Two environments (tina and tina_eval) are created using mamba.
Prerequisites: Python 3.10 and 3.11, conda, mamba. Pre-trained models need to be downloaded to a specified CKPT_DIR.
Setup: Environment setup involves running shell scripts (set_vars.sh, set_env.sh, set_env_eval.sh).
Resources: Reproducing experiments costs ~$526, with the best checkpoint costing ~$9.
Links: Notion, Hugging Face Collection, Weights and Biases.

Highlighted Details

Achieves >20% performance increase and 43.33% Pass@1 accuracy on AIME24.
Demonstrates cost-effective training with LoRA, costing only $9 for the best checkpoint.
Leverages the open-r1 project as its codebase foundation.
Utilizes multiple open-source reasoning datasets for training.

Maintenance & Community

The project is associated with Shangshang Wang and researchers from USC. It cites multiple open-source datasets and the open-r1 project.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license, which is crucial for determining commercial usability. The setup involves multiple environment and path configurations, potentially requiring careful adaptation to user environments.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days