ColossalAI  by hpcaitech

AI system for large-scale parallel training

created 3 years ago
41,057 stars

Top 0.7% on sourcepulse

GitHubView on GitHub
Project Summary

Colossal-AI is a unified deep learning system designed to make training and inference of large AI models more efficient, cost-effective, and accessible. It targets researchers and engineers working with massive models, offering a suite of parallelization strategies and memory optimization techniques to simplify distributed training and inference.

How It Works

Colossal-AI provides a comprehensive set of parallelization strategies, including Data Parallelism, Pipeline Parallelism, 1D/2D/2.5D/3D Tensor Parallelism, Sequence Parallelism, and Zero Redundancy Optimizer (ZeRO). It also features heterogeneous memory management (PatrickStar) and an auto-parallelism system. This multi-faceted approach allows users to scale their models across multiple GPUs and nodes with minimal code changes, abstracting away the complexities of distributed computing.

Quick Start & Requirements

  • Install: pip install colossalai (Linux only). For PyTorch extensions: BUILD_EXT=1 pip install colossalai. Nightly builds: pip install colossalai-nightly.
  • Requirements: PyTorch >= 2.2, Python >= 3.7, CUDA >= 11.0, NVIDIA GPU Compute Capability >= 7.0. Linux OS.
  • From Source: git clone the repository, cd ColossalAI, pip install . (with BUILD_EXT=1 for CUDA kernels).
  • Docker: docker build -t colossalai ./docker and run with docker run -ti --gpus all --rm --ipc=host colossalai bash.
  • Documentation: https://colossalai.readthedocs.io/en/latest/

Highlighted Details

  • Offers solutions for training and inference of large models like LLaMA, GPT, and Stable Diffusion.
  • Achieves significant speedups and memory reductions, e.g., 5.6x less memory for Stable Diffusion training, 2x faster inference with Colossal-Inference.
  • Features Open-Sora for Sora-like video generation and ColossalChat for RLHF pipeline implementation.
  • Supports advanced parallelism techniques including 3D Tensor Parallelism and ZeRO.

Maintenance & Community

  • Active development with regular updates and releases.
  • Community channels include Forum and Slack.
  • Welcomes contributions from developers and partners.
  • Cite Us: BibTeX citation provided.

Licensing & Compatibility

  • The project appears to be under a permissive license, but specific details are not explicitly stated in the README. Compatibility for commercial use should be verified.

Limitations & Caveats

  • Installation is currently Linux-only.
  • Building CUDA extensions requires specific setup steps.
  • Users with older CUDA versions (e.g., 10.2) may need to manually download and integrate the cub library.
Health Check
Last commit

3 days ago

Responsiveness

1 week

Pull Requests (30d)
12
Issues (30d)
6
Star History
385 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.