C-Optim  by kyleliang919

Improving transformer training with a single line of code

Created 1 year ago
369 stars

Top 76.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository introduces Cautious Optimizers (C-Optim), a novel modification to momentum-based optimizers that enhances training speed and stability in deep learning models. It targets researchers and engineers working on large-scale model pretraining and fine-tuning, offering a simple, one-line code change to improve performance.

How It Works

C-Optim applies a single-line modification to existing PyTorch optimizers, such as AdamW and Lion, creating variants like C-AdamW and C-Lion. This modification is theoretically shown to preserve Adam's Hamiltonian function and convergence guarantees under Lyapunov analysis. This approach yields a new family of optimizers, with the simplest variant demonstrating significant speed-ups.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Requires PyTorch.
  • Examples provided for Llama, MAE, Qwen2.5, and PPO training.
  • Links: Paper, Hugging Face integration

Highlighted Details

  • Achieves up to 1.47x speed-up on Llama and MAE pretraining.
  • Integrated into Hugging Face's pytorch-image-models.
  • Supports PPO for reinforcement learning tasks.
  • Post-training experiments on Qwen2.5 models are available.

Maintenance & Community

  • Official implementation released November 2024.
  • Paper available on arXiv.
  • Active development with recent updates in January 2025.

Licensing & Compatibility

  • The repository does not explicitly state a license.

Limitations & Caveats

The project is pre-release with a paper published in late 2024, indicating ongoing development and potential for future changes. No specific limitations or unsupported platforms are mentioned in the README.

Health Check
Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
265
Efficiently train foundation models with PyTorch
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.