relora by Guitaricet

PEFT pretraining code for ReLoRA research paper

Created 2 years ago

473 stars

Top 64.4% on SourcePulse

View on GitHub

3 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Wing Lian

Founder of Axolotl AI

Benjamin Bolte

Cofounder of K-Scale Labs

Project Summary

This repository provides the official implementation for ReLoRA, a technique designed to enable high-rank training of large language models through low-rank updates. It is intended for researchers and practitioners looking to improve model training efficiency and performance by effectively integrating low-rank adaptations.

How It Works

ReLoRA integrates existing LoRA parameters back into the main network weights and then resets them. This approach aims to be more flexible than standard LoRA by allowing for more frequent updates and potentially higher effective rank. Key parameters include the reset frequency (--relora), optimizer reset behavior (--reset_optimizer_on_relora, --optimizer_magnitude_pruning), and a cyclical learning rate scheduler (cosine_restarts) with --cycle_length.

Quick Start & Requirements

Install via pip install -e . and pip install flash-attn.
Requires Python 3.10+ and PyTorch 2.0+.
Flash Attention is recommended for performance but not strictly required by the requirements.txt.
See README for detailed usage examples and configuration.

Highlighted Details

Supports distributed training via PyTorch DDP (torchrun).
Allows warm-starting from partially trained checkpoints.
Offers options for optimizer state management during resets, including magnitude pruning.
Uses a cosine_restarts learning rate scheduler for cyclical training.

Maintenance & Community

The project is associated with the paper "Stack More Layers Differently: High-Rank Training Through Low-Rank Updates".
No specific community links (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license.
Compatibility for commercial use or closed-source linking is undetermined without a specified license.

Limitations & Caveats

The project is presented as the official code for a research paper, and its current state of maintenance and long-term support is not detailed. The README mentions that main.py will be deleted, recommending torchrun for single-GPU training.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days