InternEvo by InternLM

Lightweight training framework for model pre-training

Created 2 years ago

417 stars

Top 70.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Project Summary

InternEvo is a lightweight training framework designed for efficient large language model pre-training and fine-tuning, supporting massive clusters and single-GPU setups. It aims to simplify the training process without extensive dependencies, enabling users to achieve high performance and accelerate training on large-scale hardware.

How It Works

InternEvo employs a modular design that integrates various parallelization strategies, including Data Parallelism, Tensor Parallelism (MTP), Pipeline Parallelism, Sequence Parallelism (FSP, ISP), and ZeRO optimization. This multi-faceted approach allows for efficient scaling across thousands of GPUs and optimized memory usage, contributing to its reported high acceleration efficiency. The framework also supports streaming datasets and integrates with libraries like Flash-Attention for further performance gains.

Quick Start & Requirements

Installation: pip install InternEvo
Prerequisites: PyTorch (e.g., torch==2.1.0+cu118), torchvision, torchaudio, torch-scatter. Optional: flash-attn==2.2.1 for acceleration.
Data: Hugging Face datasets (streaming supported), custom tokenizers.
Training: Supports Slurm and torchrun distributed execution.
Documentation: Usage, Installation

Highlighted Details

Supports training on NPU-910B clusters.
Achieves nearly 90% acceleration efficiency on 1024 GPUs.
Integrates with various LLM architectures (InternLM, Llama2, Qwen2, etc.).
Offers multiple parallelism techniques: Tensor, Pipeline, Sequence, ZeRO.

Maintenance & Community

Actively developed by Shanghai AI Laboratory and university/company researchers.
Encourages community contributions.
Links to documentation and issue reporting are provided.

Licensing & Compatibility

Licensed under Apache 2.0.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

The framework requires specific PyTorch versions and CUDA versions for optimal performance, and Flash-Attention installation is conditional on environment support. Detailed configuration for diverse hardware setups might require consulting the extensive documentation.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days