DeepSpeed  by deepspeedai

Deep learning optimization library for distributed training and inference

Created 5 years ago
40,119 stars

Top 0.7% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSpeed is a comprehensive deep learning optimization library designed to simplify and enhance distributed training and inference for large models. It targets researchers and practitioners needing to scale model size and performance beyond single-GPU capabilities, offering significant speedups and cost reductions.

How It Works

DeepSpeed is built on four core pillars: Training, Inference, Compression, and Science. It employs advanced parallelism techniques (ZeRO, 3D-Parallelism, MoE) for efficient training, custom kernels and heterogeneous memory for low-latency inference, and quantization methods (ZeroQuant, XTC) for model size reduction. This modular approach allows for flexible composition of features to tackle extreme-scale deep learning challenges.

Quick Start & Requirements

  • Install via pip: pip install deepspeed
  • Requires PyTorch (>= 1.9 recommended).
  • C++/CUDA/HIP compiler (nvcc or hipcc) for JIT compilation of extensions.
  • Tested on NVIDIA Pascal, Volta, Ampere, Hopper; AMD MI100, MI200.
  • Windows support available with specific build steps.
  • Validate installation with ds_report.
  • Official documentation: deepspeed.ai

Highlighted Details

  • Enables training of models with trillions of parameters.
  • Achieves significant speedups and cost reductions for training and inference.
  • Supports advanced parallelism techniques like ZeRO, 3D-Parallelism, and Mixture-of-Experts.
  • Offers state-of-the-art compression techniques for efficient deployment.
  • Integrates with popular frameworks like Hugging Face Transformers, Accelerate, and PyTorch Lightning.

Maintenance & Community

  • Actively developed by Microsoft.
  • Numerous publications and contributions from the research community.
  • Active community support channels are not explicitly linked in the README.
  • Roadmap and contributing guidelines are available.

Licensing & Compatibility

  • MIT License.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • Some features like async IO and GDS are not supported on Windows.
  • JIT compilation of C++/CUDA extensions may require specific build tools and environments.
Health Check
Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
45
Issues (30d)
29
Star History
386 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Casper Hansen Casper Hansen(Author of AutoAWQ), and
3 more.

deepsparse by neuralmagic

0%
3k
CPU inference runtime for sparse deep learning models
Created 4 years ago
Updated 3 months ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

neural-compressor by intel

0.2%
2k
Python library for model compression (quantization, pruning, distillation, NAS)
Created 5 years ago
Updated 16 hours ago
Feedback? Help us improve.