DeepSpeed by deepspeedai

Deep learning optimization library for distributed training and inference

Created 6 years ago

41,195 stars

Top 0.7% on SourcePulse

View on GitHub

41 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Tomas Valenta

Cofounder of E2B

Wing Lian

Founder of Axolotl AI

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

and 37 more!

Project Summary

DeepSpeed is a comprehensive deep learning optimization library designed to simplify and enhance distributed training and inference for large models. It targets researchers and practitioners needing to scale model size and performance beyond single-GPU capabilities, offering significant speedups and cost reductions.

How It Works

DeepSpeed is built on four core pillars: Training, Inference, Compression, and Science. It employs advanced parallelism techniques (ZeRO, 3D-Parallelism, MoE) for efficient training, custom kernels and heterogeneous memory for low-latency inference, and quantization methods (ZeroQuant, XTC) for model size reduction. This modular approach allows for flexible composition of features to tackle extreme-scale deep learning challenges.

Quick Start & Requirements

Install via pip: pip install deepspeed
Requires PyTorch (>= 1.9 recommended).
C++/CUDA/HIP compiler (nvcc or hipcc) for JIT compilation of extensions.
Tested on NVIDIA Pascal, Volta, Ampere, Hopper; AMD MI100, MI200.
Windows support available with specific build steps.
Validate installation with ds_report.
Official documentation: deepspeed.ai

Highlighted Details

Enables training of models with trillions of parameters.
Achieves significant speedups and cost reductions for training and inference.
Supports advanced parallelism techniques like ZeRO, 3D-Parallelism, and Mixture-of-Experts.
Offers state-of-the-art compression techniques for efficient deployment.
Integrates with popular frameworks like Hugging Face Transformers, Accelerate, and PyTorch Lightning.

Maintenance & Community

Actively developed by Microsoft.
Numerous publications and contributions from the research community.
Active community support channels are not explicitly linked in the README.
Roadmap and contributing guidelines are available.

Licensing & Compatibility

MIT License.
Compatible with commercial and closed-source applications.

Limitations & Caveats

Some features like async IO and GDS are not supported on Windows.
JIT compilation of C++/CUDA extensions may require specific build tools and environments.

Health Check

Last Commit

10 hours ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

270 stars in the last 30 days