composer  by mosaicml

DL framework for training at scale, optimized for large-scale clusters

created 3 years ago
5,390 stars

Top 9.5% on sourcepulse

GitHubView on GitHub
Project Summary

Composer is an open-source PyTorch library designed to simplify and accelerate deep learning model training at scale. It targets researchers and engineers working with large models like LLMs, diffusion models, and transformers, abstracting complexities of distributed training, data loading, and memory optimization to enable faster experimentation and iteration.

How It Works

Composer centers around a highly optimized Trainer abstraction that streamlines PyTorch training loops. It integrates advanced parallelism techniques like PyTorch FullyShardedDataParallelism (FSDP) and standard Distributed Data Parallelism (DDP) for efficient multi-node training. A flexible callback system allows users to inject custom logic at various training stages, while built-in speedup algorithms, inspired by recent research, can be composed into "recipes" to significantly boost training throughput.

Quick Start & Requirements

  • Installation: pip install mosaicml
  • Prerequisites: Python, PyTorch, CUDA-compatible GPUs (recommended).
  • Resources: Docker images are available for simplified environment setup.
  • Links: Website, Getting Started, Docs

Highlighted Details

  • Scalability: Supports training from 1 to 512 GPUs and datasets from 50MB to 10TB.
  • Elastic Checkpointing: Enables resuming training on different hardware configurations.
  • Data Streaming: Integrates with MosaicML StreamingDataset for on-the-fly data loading from cloud storage.
  • Workflow Automation: Features like auto-resumption and CUDA OOM prevention simplify training management.

Maintenance & Community

  • Actively developed by MosaicML, with contributions from the broader ML community.
  • Community support available via Slack.
  • Resources include tutorials for BERT, LLMs, and migrating from PyTorch Lightning.

Licensing & Compatibility

  • Apache 2.0 License.
  • Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

  • The library is not recommended for Graph Neural Networks (GNNs), Generative Adversarial Networks (GANs), or reinforcement learning (RL) due to design assumptions that may be suboptimal for these domains.
Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
23
Issues (30d)
4
Star History
56 stars in the last 90 days

Explore Similar Projects

Starred by Lewis Tunstall Lewis Tunstall(Researcher at Hugging Face), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
3 more.

FARM by deepset-ai

0%
2k
NLP framework for transfer learning with BERT & Co
created 6 years ago
updated 1 year ago
Feedback? Help us improve.