relora  by Guitaricet

PEFT pretraining code for ReLoRA research paper

Created 2 years ago
463 stars

Top 65.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for ReLoRA, a technique designed to enable high-rank training of large language models through low-rank updates. It is intended for researchers and practitioners looking to improve model training efficiency and performance by effectively integrating low-rank adaptations.

How It Works

ReLoRA integrates existing LoRA parameters back into the main network weights and then resets them. This approach aims to be more flexible than standard LoRA by allowing for more frequent updates and potentially higher effective rank. Key parameters include the reset frequency (--relora), optimizer reset behavior (--reset_optimizer_on_relora, --optimizer_magnitude_pruning), and a cyclical learning rate scheduler (cosine_restarts) with --cycle_length.

Quick Start & Requirements

  • Install via pip install -e . and pip install flash-attn.
  • Requires Python 3.10+ and PyTorch 2.0+.
  • Flash Attention is recommended for performance but not strictly required by the requirements.txt.
  • See README for detailed usage examples and configuration.

Highlighted Details

  • Supports distributed training via PyTorch DDP (torchrun).
  • Allows warm-starting from partially trained checkpoints.
  • Offers options for optimizer state management during resets, including magnitude pruning.
  • Uses a cosine_restarts learning rate scheduler for cyclical training.

Maintenance & Community

  • The project is associated with the paper "Stack More Layers Differently: High-Rank Training Through Low-Rank Updates".
  • No specific community links (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

  • The repository does not explicitly state a license.
  • Compatibility for commercial use or closed-source linking is undetermined without a specified license.

Limitations & Caveats

The project is presented as the official code for a research paper, and its current state of maintenance and long-term support is not detailed. The README mentions that main.py will be deleted, recommending torchrun for single-GPU training.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Victor Taelin Victor Taelin(Author of Bend, Kind, HVM), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
2 more.

nanoT5 by PiotrNawrot

0.2%
1k
PyTorch code for T5 pre-training and fine-tuning on a single GPU
Created 2 years ago
Updated 1 year ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.3%
9k
PyTorch training helper for distributed execution
Created 4 years ago
Updated 1 day ago
Feedback? Help us improve.