accelerate  by huggingface

PyTorch training helper for distributed execution

created 4 years ago
8,984 stars

Top 5.8% on sourcepulse

GitHubView on GitHub
Project Summary

Hugging Face Accelerate simplifies distributed PyTorch training and inference across diverse hardware configurations, including multi-CPU, multi-GPU, and TPUs. It targets PyTorch users who want to leverage distributed computing and mixed precision without extensive boilerplate code modifications, enabling faster and more scalable model development.

How It Works

Accelerate acts as a thin wrapper around PyTorch's distributed capabilities, abstracting away device placement and distributed communication logic. By initializing an Accelerator object and calling accelerator.prepare() on models, optimizers, and data loaders, users can seamlessly transition their existing PyTorch training scripts to run on various distributed setups and mixed precision formats (FP16, BF16, FP8) with minimal code changes. This approach preserves the user's control over the training loop while handling the complexities of distributed execution.

Quick Start & Requirements

Highlighted Details

  • Supports single/multi-CPU, single/multi-GPU, and TPU configurations.
  • Integrates automatic mixed precision (FP16, BF16, FP8) with Transformer Engine or MS-AMP.
  • Experimental support for DeepSpeed, PyTorch Fully Sharded Data Parallel (FSDP), and Megatron-LM.
  • Provides an optional CLI tool (accelerate config) for easy environment setup and script launching.
  • Offers notebook_launcher for distributed training within notebooks (e.g., Colab, Kaggle).

Maintenance & Community

  • Developed by Hugging Face with contributions from numerous individuals.
  • Widely integrated into other popular libraries like transformers, fastai, and stable-diffusion-webui.
  • Community support channels are available via Hugging Face's platforms.

Licensing & Compatibility

  • Apache License 2.0.
  • Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

  • DeepSpeed, FSDP, and Megatron-LM integrations are marked as experimental.
  • Requires users to write their own training loops; not a high-level framework.
Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
31
Issues (30d)
29
Star History
351 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Lewis Tunstall Lewis Tunstall(Researcher at Hugging Face), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
5 more.

torchtune by pytorch

0.2%
5k
PyTorch library for LLM post-training and experimentation
created 1 year ago
updated 1 day ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 17 hours ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 23 hours ago
Feedback? Help us improve.