InternEvo  by InternLM

Lightweight training framework for model pre-training

Created 1 year ago
407 stars

Top 71.6% on SourcePulse

GitHubView on GitHub
Project Summary

InternEvo is a lightweight training framework designed for efficient large language model pre-training and fine-tuning, supporting massive clusters and single-GPU setups. It aims to simplify the training process without extensive dependencies, enabling users to achieve high performance and accelerate training on large-scale hardware.

How It Works

InternEvo employs a modular design that integrates various parallelization strategies, including Data Parallelism, Tensor Parallelism (MTP), Pipeline Parallelism, Sequence Parallelism (FSP, ISP), and ZeRO optimization. This multi-faceted approach allows for efficient scaling across thousands of GPUs and optimized memory usage, contributing to its reported high acceleration efficiency. The framework also supports streaming datasets and integrates with libraries like Flash-Attention for further performance gains.

Quick Start & Requirements

  • Installation: pip install InternEvo
  • Prerequisites: PyTorch (e.g., torch==2.1.0+cu118), torchvision, torchaudio, torch-scatter. Optional: flash-attn==2.2.1 for acceleration.
  • Data: Hugging Face datasets (streaming supported), custom tokenizers.
  • Training: Supports Slurm and torchrun distributed execution.
  • Documentation: Usage, Installation

Highlighted Details

  • Supports training on NPU-910B clusters.
  • Achieves nearly 90% acceleration efficiency on 1024 GPUs.
  • Integrates with various LLM architectures (InternLM, Llama2, Qwen2, etc.).
  • Offers multiple parallelism techniques: Tensor, Pipeline, Sequence, ZeRO.

Maintenance & Community

  • Actively developed by Shanghai AI Laboratory and university/company researchers.
  • Encourages community contributions.
  • Links to documentation and issue reporting are provided.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The framework requires specific PyTorch versions and CUDA versions for optimal performance, and Flash-Attention installation is conditional on environment support. Detailed configuration for diverse hardware setups might require consulting the extensive documentation.

Health Check
Last Commit

4 weeks ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
265
Efficiently train foundation models with PyTorch
Created 1 year ago
Updated 1 month ago
Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0.1%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 2 years ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
26 more.

axolotl by axolotl-ai-cloud

0.5%
10k
CLI tool for streamlined post-training of AI models
Created 2 years ago
Updated 13 hours ago
Feedback? Help us improve.