rho by microsoft

LLM pretraining research paper using selective language modeling (SLM)

Created 1 year ago

453 stars

Top 66.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Maxime Labonne

Head of Post-Training at Liquid AI

Wing Lian

Founder of Axolotl AI

Project Summary

Rho-1 introduces Selective Language Modeling (SLM) for efficient LLM pre-training, targeting researchers and developers aiming to improve model performance on complex tasks like mathematics with significantly fewer training tokens. The project offers pre-trained models and evaluation scripts, demonstrating substantial gains in few-shot accuracy on benchmarks like GSM8k and MATH.

How It Works

SLM optimizes LLM pre-training by focusing on high-quality, "useful" tokens. The process involves training a reference model, scoring tokens in a corpus based on their loss relative to the reference, and then selectively training the target LLM only on tokens exhibiting higher "excess loss." This approach aims to filter out noisy or less informative tokens, leading to faster convergence and improved performance with reduced computational cost.

Quick Start & Requirements

Evaluation: Navigate to rho-1/math-evaluation-harness and run bash scripts/run_eval.sh cot microsoft/rho-math-7b-v0.1 for base model evaluation or bash scripts/run_eval.sh tora microsoft/rho-math-7b-interpreter-v0.1 for code interpreter model evaluation.
Dependencies: Requires Python and standard ML libraries. Specific versions are not detailed, but evaluation scripts suggest compatibility with Hugging Face models.
Resources: Pre-trained models are available on Hugging Face. Evaluation requires computational resources for running inference.

Highlighted Details

Rho-Math-1B and Rho-Math-7B models achieve competitive few-shot accuracy on MATH with up to 97% fewer pretraining tokens compared to baselines.
Rho-Math-1B-Interpreter is the first 1B LLM to exceed 40% accuracy on MATH.
Rho-Math-7B-Interpreter achieves 52% on MATH with only 69k fine-tuning samples.
The project provides evaluation scripts for both few-shot CoT and tool-integrated reasoning (Code Interpreter) scenarios.

Maintenance & Community

The project is associated with Microsoft Research. Contributions are welcomed, requiring agreement to a Contributor License Agreement (CLA).

Licensing & Compatibility

The code is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The README focuses on evaluation and pre-trained models; details on training the SLM from scratch or the specific data filtering mechanisms are not fully elaborated in the provided text.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days