accelerate  by huggingface

PyTorch training helper for distributed execution

Created 4 years ago
9,138 stars

Top 5.6% on SourcePulse

GitHubView on GitHub
Project Summary

Hugging Face Accelerate simplifies distributed PyTorch training and inference across diverse hardware configurations, including multi-CPU, multi-GPU, and TPUs. It targets PyTorch users who want to leverage distributed computing and mixed precision without extensive boilerplate code modifications, enabling faster and more scalable model development.

How It Works

Accelerate acts as a thin wrapper around PyTorch's distributed capabilities, abstracting away device placement and distributed communication logic. By initializing an Accelerator object and calling accelerator.prepare() on models, optimizers, and data loaders, users can seamlessly transition their existing PyTorch training scripts to run on various distributed setups and mixed precision formats (FP16, BF16, FP8) with minimal code changes. This approach preserves the user's control over the training loop while handling the complexities of distributed execution.

Quick Start & Requirements

Highlighted Details

  • Supports single/multi-CPU, single/multi-GPU, and TPU configurations.
  • Integrates automatic mixed precision (FP16, BF16, FP8) with Transformer Engine or MS-AMP.
  • Experimental support for DeepSpeed, PyTorch Fully Sharded Data Parallel (FSDP), and Megatron-LM.
  • Provides an optional CLI tool (accelerate config) for easy environment setup and script launching.
  • Offers notebook_launcher for distributed training within notebooks (e.g., Colab, Kaggle).

Maintenance & Community

  • Developed by Hugging Face with contributions from numerous individuals.
  • Widely integrated into other popular libraries like transformers, fastai, and stable-diffusion-webui.
  • Community support channels are available via Hugging Face's platforms.

Licensing & Compatibility

  • Apache License 2.0.
  • Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

  • DeepSpeed, FSDP, and Megatron-LM integrations are marked as experimental.
  • Requires users to write their own training loops; not a high-level framework.
Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
36
Issues (30d)
35
Star History
106 stars in the last 30 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
28k
LLM training in pure C/CUDA, no PyTorch needed
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.