ColossalAI  by hpcaitech

AI system for large-scale parallel training

Created 3 years ago
41,161 stars

Top 0.7% on SourcePulse

GitHubView on GitHub
Project Summary

Colossal-AI is a unified deep learning system designed to make training and inference of large AI models more efficient, cost-effective, and accessible. It targets researchers and engineers working with massive models, offering a suite of parallelization strategies and memory optimization techniques to simplify distributed training and inference.

How It Works

Colossal-AI provides a comprehensive set of parallelization strategies, including Data Parallelism, Pipeline Parallelism, 1D/2D/2.5D/3D Tensor Parallelism, Sequence Parallelism, and Zero Redundancy Optimizer (ZeRO). It also features heterogeneous memory management (PatrickStar) and an auto-parallelism system. This multi-faceted approach allows users to scale their models across multiple GPUs and nodes with minimal code changes, abstracting away the complexities of distributed computing.

Quick Start & Requirements

  • Install: pip install colossalai (Linux only). For PyTorch extensions: BUILD_EXT=1 pip install colossalai. Nightly builds: pip install colossalai-nightly.
  • Requirements: PyTorch >= 2.2, Python >= 3.7, CUDA >= 11.0, NVIDIA GPU Compute Capability >= 7.0. Linux OS.
  • From Source: git clone the repository, cd ColossalAI, pip install . (with BUILD_EXT=1 for CUDA kernels).
  • Docker: docker build -t colossalai ./docker and run with docker run -ti --gpus all --rm --ipc=host colossalai bash.
  • Documentation: https://colossalai.readthedocs.io/en/latest/

Highlighted Details

  • Offers solutions for training and inference of large models like LLaMA, GPT, and Stable Diffusion.
  • Achieves significant speedups and memory reductions, e.g., 5.6x less memory for Stable Diffusion training, 2x faster inference with Colossal-Inference.
  • Features Open-Sora for Sora-like video generation and ColossalChat for RLHF pipeline implementation.
  • Supports advanced parallelism techniques including 3D Tensor Parallelism and ZeRO.

Maintenance & Community

  • Active development with regular updates and releases.
  • Community channels include Forum and Slack.
  • Welcomes contributions from developers and partners.
  • Cite Us: BibTeX citation provided.

Licensing & Compatibility

  • The project appears to be under a permissive license, but specific details are not explicitly stated in the README. Compatibility for commercial use should be verified.

Limitations & Caveats

  • Installation is currently Linux-only.
  • Building CUDA extensions requires specific setup steps.
  • Users with older CUDA versions (e.g., 10.2) may need to manually download and integrate the cub library.
Health Check
Last Commit

13 hours ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
3
Star History
100 stars in the last 30 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm_training_handbook by huggingface

0%
511
Handbook for large language model training methodologies
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
20 more.

alpa by alpa-projects

0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Created 4 years ago
Updated 1 year ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.2%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 days ago
Feedback? Help us improve.