COAT by NVlabs

FP8 training framework for memory efficiency

Created 1 year ago

262 stars

Top 97.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

This project addresses the memory bottleneck in training large AI models by introducing COAT, a novel FP8 quantization method for optimizer states and activations. It targets researchers and engineers seeking to train larger models on limited hardware, offering significant memory reduction and speedup while maintaining accuracy.

How It Works

COAT enhances FP8 training efficiency through two core innovations. First, Dynamic Range Expansion is applied to optimizer states, dynamically adjusting their representation range to better align with FP8 capabilities, thereby minimizing quantization errors. Second, Mixed-Granularity Activation Quantization optimizes activation memory by employing per-tensor quantization for linear layers and more granular strategies (like VS-Quant) for non-linear layers. This dual approach effectively reduces memory footprint and accelerates training without compromising model performance.

Quick Start & Requirements

Installation is available via pip (pip install fp8-coat) or from source by cloning the repository and running the provided environment_setup.sh script to create a Conda environment. COAT supports Llama 2/3 models and integrates seamlessly with Hugging Face's transformers Trainer.

Highlighted Details

Achieves a 1.54x reduction in end-to-end memory footprint and a 1.43x training speedup compared to BF16.
Enables doubling batch sizes, facilitating the training of larger models on fewer GPUs.
Demonstrates nearly lossless accuracy on tasks including LLM pretraining/fine-tuning and Vision Language Model training.
Specific benchmarks show a 26% speedup for Llama-2-7B fine-tuning on 8x H100 GPUs.

Maintenance & Community

The project includes a "To-Do List" indicating ongoing development, with plans for TorchTitan and FSDP2 support. No explicit community channels (e.g., Discord, Slack) or detailed contributor information are prominently featured in the README.

Licensing & Compatibility

The repository's README does not explicitly state a software license. This omission requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

Ongoing development means certain features, such as support for TorchTitan and FSDP2, are still under development. The absence of a clearly defined license is a significant caveat for adoption.

Health Check

Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days