horovod  by horovod

Distributed training framework for TF, Keras, PyTorch, and MXNet

created 8 years ago
14,558 stars

Top 3.5% on sourcepulse

GitHubView on GitHub
Project Summary

Horovod is a distributed deep learning training framework designed to simplify and accelerate the scaling of training workloads across multiple GPUs and nodes. It targets researchers and engineers working with TensorFlow, Keras, PyTorch, and Apache MXNet, enabling them to leverage distributed computing with minimal code changes and achieve significant performance gains.

How It Works

Horovod utilizes a ring-based AllReduce algorithm, inspired by Message Passing Interface (MPI) concepts, to efficiently synchronize gradients across workers. This approach minimizes communication overhead by interleaving gradient computation with communication and supports tensor fusion to batch small AllReduce operations, further boosting performance. It requires minimal code modifications to existing single-GPU training scripts.

Quick Start & Requirements

  • Install via pip: pip install horovod
  • For GPU support with NCCL: HOROVOD_GPU_OPERATIONS=NCCL pip install horovod
  • Prerequisites: CMake, a C++17-compliant compiler (g++8+ for TF 2.10+), and potentially MPI/NCCL depending on the installation.
  • Official Documentation: https://horovod.readthedocs.io/en/latest/

Highlighted Details

  • Achieves 90% scaling efficiency for Inception V3 and ResNet-101 on large clusters.
  • Supports TensorFlow, Keras, PyTorch, and MXNet.
  • Features like Tensor Fusion, Horovod Timeline, and automated performance tuning optimize distributed training.
  • Can run with or without MPI, utilizing Gloo as an alternative backend.

Maintenance & Community

  • Hosted by the LF AI & Data Foundation.
  • Active community with Slack channels for discussion and announcements.
  • Slack Community

Licensing & Compatibility

  • Apache 2.0 License.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

  • Initial setup may require installing and configuring MPI and NCCL, which can be complex for infrastructure teams.
  • While designed for ease of use, understanding distributed training concepts and Horovod's specific API is beneficial for optimal performance.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
5
Issues (30d)
0
Star History
117 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Zhiqiang Xie Zhiqiang Xie(Author of SGLang).

veScale by volcengine

0.1%
839
PyTorch-native framework for LLM training
created 1 year ago
updated 3 weeks ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Eugene Yan Eugene Yan(AI Scientist at AWS), and
10 more.

accelerate by huggingface

0.2%
9k
PyTorch training helper for distributed execution
created 4 years ago
updated 1 day ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 14 hours ago
Starred by Peter Norvig Peter Norvig(Author of Artificial Intelligence: A Modern Approach; Research Director at Google), Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), and
45 more.

tensorflow by tensorflow

0.1%
191k
Open-source ML framework
created 9 years ago
updated 9 hours ago
Feedback? Help us improve.