YaFSDP  by yandex

Sharded data parallelism framework for transformer-like neural networks

Created 1 year ago
975 stars

Top 37.9% on SourcePulse

GitHubView on GitHub
Project Summary

YaFSDP is a Sharded Data Parallelism framework designed for efficient training of transformer-like neural network architectures, particularly Large Language Models (LLMs). It targets researchers and engineers working with large-scale models who need to optimize training speed and memory usage, offering up to 20% faster pre-training and improved performance under high memory pressure compared to PyTorch's FSDP.

How It Works

YaFSDP is built to reduce communication and memory operation overhead. While specific internal mechanisms are not detailed in the README, its performance gains suggest optimizations in parameter sharding, gradient communication, and memory management strategies tailored for transformer architectures. This approach aims to maximize GPU utilization and minimize synchronization bottlenecks during distributed training.

Quick Start & Requirements

  • Installation: Requires building a Docker image using docker/build.sh.
  • Prerequisites: NVIDIA PyTorch Docker image, patched 🤗 libraries (provided in patches/ folder).
  • Resources: Benchmarks were conducted on clusters with A100 80 GB GPUs.
  • Examples: Training examples for causal pre-training (clm.md) and supervised fine-tuning (sft.md) are available.

Highlighted Details

  • Up to 20% faster pre-training for LLMs compared to PyTorch FSDP.
  • Demonstrated performance improvements across models from 7B to 70B parameters and 64 to 256 devices.
  • Achieves significant speedups (up to 26.60%) on larger models like Llama 3 70B.
  • Optimized for high memory pressure conditions.

Maintenance & Community

Developed and maintained by Yandex. Users can open GitHub issues for bugs or questions.

Licensing & Compatibility

The README does not explicitly state the license.

Limitations & Caveats

The project requires building a custom Docker image with patched libraries, indicating potential integration complexity and a dependency on specific library versions.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
11 more.

Liger-Kernel by linkedin

0.6%
6k
Triton kernels for efficient LLM training
Created 1 year ago
Updated 1 day ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
36 more.

unsloth by unslothai

0.6%
46k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 14 hours ago
Feedback? Help us improve.