ColossalAI-Examples  by hpcaitech

Examples for training models with hybrid parallelism using ColossalAI

created 3 years ago
340 stars

Top 82.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides examples for training models with ColossalAI, a framework for large-scale AI model training. It targets researchers and engineers working with large models, offering demonstrations of various parallelism techniques and complex model architectures.

How It Works

The examples showcase ColossalAI's hybrid parallelism strategies, including Tensor Parallelism, Pipeline Parallelism, and ZeRO, alongside features like mixed-precision training and gradient accumulation. This approach aims to enable efficient training of massive models by distributing computation and memory across multiple devices and nodes.

Quick Start & Requirements

  • Install ColossalAI and dependencies: pip install -r requirements.txt
  • Requires ColossalAI framework and Titans.
  • Links: Colossal-AI, Documentation, Forum

Highlighted Details

  • Demonstrates parallelism techniques: Tensor Parallel, Pipeline Parallel, Sequence Parallel, and ZeRO.
  • Includes examples for Computer Vision (ResNet, SimCLR, Vision Transformer) and Natural Language Processing (BERT, GPT-2, GPT-3).
  • Covers advanced features like mixed-precision training, gradient accumulation, and gradient clipping.
  • Offers examples for complex model applications in image and language domains.

Maintenance & Community

  • This repository is deprecated and superseded by ColossalAI/example.
  • Discussion forum available for community exchange. Issues can be raised in the repository.

Licensing & Compatibility

  • License details are not explicitly stated in the provided README snippet.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This repository is marked as deprecated, indicating potential lack of ongoing maintenance and support. Users are directed to a newer repository for current examples.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 4 days ago
Feedback? Help us improve.