C-Optim  by kyleliang919

Improving transformer training with a single line of code

created 1 year ago
344 stars

Top 81.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository introduces Cautious Optimizers (C-Optim), a novel modification to momentum-based optimizers that enhances training speed and stability in deep learning models. It targets researchers and engineers working on large-scale model pretraining and fine-tuning, offering a simple, one-line code change to improve performance.

How It Works

C-Optim applies a single-line modification to existing PyTorch optimizers, such as AdamW and Lion, creating variants like C-AdamW and C-Lion. This modification is theoretically shown to preserve Adam's Hamiltonian function and convergence guarantees under Lyapunov analysis. This approach yields a new family of optimizers, with the simplest variant demonstrating significant speed-ups.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Requires PyTorch.
  • Examples provided for Llama, MAE, Qwen2.5, and PPO training.
  • Links: Paper, Hugging Face integration

Highlighted Details

  • Achieves up to 1.47x speed-up on Llama and MAE pretraining.
  • Integrated into Hugging Face's pytorch-image-models.
  • Supports PPO for reinforcement learning tasks.
  • Post-training experiments on Qwen2.5 models are available.

Maintenance & Community

  • Official implementation released November 2024.
  • Paper available on arXiv.
  • Active development with recent updates in January 2025.

Licensing & Compatibility

  • The repository does not explicitly state a license.

Limitations & Caveats

The project is pre-release with a paper published in late 2024, indicating ongoing development and potential for future changes. No specific limitations or unsupported platforms are mentioned in the README.

Health Check
Last commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
122 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.