InternEvo  by InternLM

Lightweight training framework for model pre-training

created 1 year ago
402 stars

Top 73.2% on sourcepulse

GitHubView on GitHub
Project Summary

InternEvo is a lightweight training framework designed for efficient large language model pre-training and fine-tuning, supporting massive clusters and single-GPU setups. It aims to simplify the training process without extensive dependencies, enabling users to achieve high performance and accelerate training on large-scale hardware.

How It Works

InternEvo employs a modular design that integrates various parallelization strategies, including Data Parallelism, Tensor Parallelism (MTP), Pipeline Parallelism, Sequence Parallelism (FSP, ISP), and ZeRO optimization. This multi-faceted approach allows for efficient scaling across thousands of GPUs and optimized memory usage, contributing to its reported high acceleration efficiency. The framework also supports streaming datasets and integrates with libraries like Flash-Attention for further performance gains.

Quick Start & Requirements

  • Installation: pip install InternEvo
  • Prerequisites: PyTorch (e.g., torch==2.1.0+cu118), torchvision, torchaudio, torch-scatter. Optional: flash-attn==2.2.1 for acceleration.
  • Data: Hugging Face datasets (streaming supported), custom tokenizers.
  • Training: Supports Slurm and torchrun distributed execution.
  • Documentation: Usage, Installation

Highlighted Details

  • Supports training on NPU-910B clusters.
  • Achieves nearly 90% acceleration efficiency on 1024 GPUs.
  • Integrates with various LLM architectures (InternLM, Llama2, Qwen2, etc.).
  • Offers multiple parallelism techniques: Tensor, Pipeline, Sequence, ZeRO.

Maintenance & Community

  • Actively developed by Shanghai AI Laboratory and university/company researchers.
  • Encourages community contributions.
  • Links to documentation and issue reporting are provided.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The framework requires specific PyTorch versions and CUDA versions for optimal performance, and Flash-Attention installation is conditional on environment support. Detailed configuration for diverse hardware setups might require consulting the extensive documentation.

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
6 more.

gpt-neox by EleutherAI

0.1%
7k
Framework for training large-scale autoregressive language models
created 4 years ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Eugene Yan Eugene Yan(AI Scientist at AWS), and
10 more.

accelerate by huggingface

0.2%
9k
PyTorch training helper for distributed execution
created 4 years ago
updated 2 days ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 19 hours ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Feedback? Help us improve.