nano-deepspeed by zv1131860787

A ZeRO teaching implementation for understanding distributed training

Created 4 months ago

524 stars

Top 59.4% on SourcePulse

Project Summary

Summary

nano-deepspeed is a teaching-focused re-implementation of DeepSpeed ZeRO, designed for understanding data flow and communication mechanisms. It targets engineers and researchers interested in learning ZeRO principles rather than production deployment. The project offers readable code and explainable behavior for small-scale comparisons with official DeepSpeed.

How It Works

This project provides a simplified, educational re-implementation of DeepSpeed's ZeRO optimizer. It focuses on core ZeRO stages 0, 1, and 2, utilizing AdamW and FP16 dynamic loss scaling. The stage 2 implementation highlights communication patterns through packed all-reduce and local scatter-back operations, prioritizing code clarity over production-level optimizations.

Quick Start & Requirements

Requires Python 3.9+, PyTorch (CUDA build recommended), and the transformers library for examples. Official DeepSpeed is only needed for comparative runs. Installation involves pip install torch transformers and pip install deepspeed. Quick start commands are provided for single and multi-GPU setups, demonstrating usage with torchrun and example configuration files.

Highlighted Details

Supports ZeRO stages 0, 1, and 2 with FP16 dynamic loss scaling.
Stage 2 communication employs packed all-reduce and local scatter-back.
Experimental results indicate nano-deepspeed uses more memory than official DeepSpeed under tested configurations (e.g., 2-GPU, 8-GPU), while achieving comparable final losses, suggesting stability.
Official DeepSpeed achieves lower memory usage through advanced communication, memory reuse, and scheduling optimizations.

Maintenance & Community

The roadmap outlines future improvements including enhanced ZeRO-2 path tooling, and minimal teaching implementations for ZeRO-3 and offload capabilities. No specific community channels (e.g., Discord, Slack) or notable contributors are listed.

Licensing & Compatibility

The license type is not explicitly stated in the provided README. Compatibility for commercial use or linking with closed-source projects is not detailed.

Limitations & Caveats

This project is strictly for learning and research, not production workloads. It lacks full feature parity with official DeepSpeed, notably omitting ZeRO-3, optimizer/parameter offload, and advanced ecosystem features like MoE or pipeline/tensor parallelism. Engineering robustness for fault tolerance and extreme-scale stability is also simpler compared to the official implementation.

nano-deepspeed by zv1131860787

Explore Similar Projects

ModelCenter by OpenBMB

lightron by lwj2015

libai by Oneflow-Inc

MINI_LLM by jiahe7ay

grokking-pytorch by Kaixhin

BMTrain by OpenBMB

mini_qwen by qiufengqijun

Megatron-DeepSpeed by bigscience-workshop

lingua by facebookresearch

dolly by databrickslabs

accelerate by huggingface

pytorch-lightning by Lightning-AI