DeepSpeed  by deepspeedai

Deep learning optimization library for distributed training and inference

created 5 years ago
39,547 stars

Top 0.7% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSpeed is a comprehensive deep learning optimization library designed to simplify and enhance distributed training and inference for large models. It targets researchers and practitioners needing to scale model size and performance beyond single-GPU capabilities, offering significant speedups and cost reductions.

How It Works

DeepSpeed is built on four core pillars: Training, Inference, Compression, and Science. It employs advanced parallelism techniques (ZeRO, 3D-Parallelism, MoE) for efficient training, custom kernels and heterogeneous memory for low-latency inference, and quantization methods (ZeroQuant, XTC) for model size reduction. This modular approach allows for flexible composition of features to tackle extreme-scale deep learning challenges.

Quick Start & Requirements

  • Install via pip: pip install deepspeed
  • Requires PyTorch (>= 1.9 recommended).
  • C++/CUDA/HIP compiler (nvcc or hipcc) for JIT compilation of extensions.
  • Tested on NVIDIA Pascal, Volta, Ampere, Hopper; AMD MI100, MI200.
  • Windows support available with specific build steps.
  • Validate installation with ds_report.
  • Official documentation: deepspeed.ai

Highlighted Details

  • Enables training of models with trillions of parameters.
  • Achieves significant speedups and cost reductions for training and inference.
  • Supports advanced parallelism techniques like ZeRO, 3D-Parallelism, and Mixture-of-Experts.
  • Offers state-of-the-art compression techniques for efficient deployment.
  • Integrates with popular frameworks like Hugging Face Transformers, Accelerate, and PyTorch Lightning.

Maintenance & Community

  • Actively developed by Microsoft.
  • Numerous publications and contributions from the research community.
  • Active community support channels are not explicitly linked in the README.
  • Roadmap and contributing guidelines are available.

Licensing & Compatibility

  • MIT License.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • Some features like async IO and GDS are not supported on Windows.
  • JIT compilation of C++/CUDA extensions may require specific build tools and environments.
Health Check
Last commit

21 hours ago

Responsiveness

1 week

Pull Requests (30d)
32
Issues (30d)
24
Star History
1,556 stars in the last 90 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
created 3 years ago
updated 2 years ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 15 hours ago
Feedback? Help us improve.