COAT  by NVlabs

FP8 training framework for memory efficiency

Created 1 year ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project addresses the memory bottleneck in training large AI models by introducing COAT, a novel FP8 quantization method for optimizer states and activations. It targets researchers and engineers seeking to train larger models on limited hardware, offering significant memory reduction and speedup while maintaining accuracy.

How It Works

COAT enhances FP8 training efficiency through two core innovations. First, Dynamic Range Expansion is applied to optimizer states, dynamically adjusting their representation range to better align with FP8 capabilities, thereby minimizing quantization errors. Second, Mixed-Granularity Activation Quantization optimizes activation memory by employing per-tensor quantization for linear layers and more granular strategies (like VS-Quant) for non-linear layers. This dual approach effectively reduces memory footprint and accelerates training without compromising model performance.

Quick Start & Requirements

Installation is available via pip (pip install fp8-coat) or from source by cloning the repository and running the provided environment_setup.sh script to create a Conda environment. COAT supports Llama 2/3 models and integrates seamlessly with Hugging Face's transformers Trainer.

Highlighted Details

  • Achieves a 1.54x reduction in end-to-end memory footprint and a 1.43x training speedup compared to BF16.
  • Enables doubling batch sizes, facilitating the training of larger models on fewer GPUs.
  • Demonstrates nearly lossless accuracy on tasks including LLM pretraining/fine-tuning and Vision Language Model training.
  • Specific benchmarks show a 26% speedup for Llama-2-7B fine-tuning on 8x H100 GPUs.

Maintenance & Community

The project includes a "To-Do List" indicating ongoing development, with plans for TorchTitan and FSDP2 support. No explicit community channels (e.g., Discord, Slack) or detailed contributor information are prominently featured in the README.

Licensing & Compatibility

The repository's README does not explicitly state a software license. This omission requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

Ongoing development means certain features, such as support for TorchTitan and FSDP2, are still under development. The absence of a clearly defined license is a significant caveat for adoption.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
41 more.

unsloth by unslothai

0.5%
49k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.