This repository provides code for Tina, a project focused on enhancing reasoning abilities in small language models using Low-Rank Adaptation (LoRA) during reinforcement learning. It targets researchers and practitioners interested in cost-effective methods for improving LLM reasoning, demonstrating competitive performance with significantly reduced computational costs.
How It Works
Tina employs LoRA during reinforcement learning (specifically GRPO) to fine-tune small language models. This approach injects reasoning capabilities by adapting only a small subset of model parameters, leading to substantial savings in training cost and time compared to full-parameter fine-tuning. The method is validated on challenging reasoning benchmarks like AIME24, showing significant performance gains.
Quick Start & Requirements
conda
for environment setup. Two environments (tina
and tina_eval
) are created using mamba
.conda
, mamba
. Pre-trained models need to be downloaded to a specified CKPT_DIR
.set_vars.sh
, set_env.sh
, set_env_eval.sh
).Highlighted Details
Maintenance & Community
The project is associated with Shangshang Wang and researchers from USC. It cites multiple open-source datasets and the open-r1 project.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not specify the license, which is crucial for determining commercial usability. The setup involves multiple environment and path configurations, potentially requiring careful adaptation to user environments.
2 months ago
Inactive