FlagScale  by FlagOpen

Large model toolkit for end-to-end management and scaling

Created 2 years ago
353 stars

Top 79.0% on SourcePulse

GitHubView on GitHub
Project Summary

FlagScale is a comprehensive toolkit designed to streamline the entire lifecycle of large language models, from development to deployment. It targets researchers and engineers working with large models, offering a unified platform to maximize computational efficiency and enhance model performance across diverse hardware architectures.

How It Works

FlagScale integrates and extends popular open-source projects like Megatron-LM and vllm, providing a flexible, multi-backend mechanism. It supports heterogeneous parallelism, enabling training and inference across different chip architectures (e.g., NVIDIA, Iluvatar) within a single instance. This approach aims to simplify complex distributed setups and unlock performance gains by leveraging specialized hardware.

Quick Start & Requirements

  • Installation: Clone the repository, then run ./install-requirements.sh --env train and ./install-requirements.sh --env inference to set up conda environments. Custom extensions for vllm and Megatron-Energon may require additional pip install commands.
  • Prerequisites: NGC's PyTorch container is recommended. Specific model training/serving may require datasets in Megatron-LM format.
  • Configuration: Uses Hydra for configuration management with experiment-level and task-level YAML files.
  • Running Tasks: A unified runner (python run.py) handles training, inference, and serving via configuration files.
  • CLI: pip install . installs a CLI for one-click deployment (e.g., flagscale serve deepseek_r1).
  • Documentation: Refer to the Quick Start section in the README for detailed instructions.

Highlighted Details

  • Supports heterogeneous pre-training and decoding across different chips within a single instance (FlagCX beta).
  • Achieved State-of-the-Art (SOTA) results on the Infinity-MM dataset with LLaVA-OneVision.
  • Accelerated generation and understanding tasks for Emu3 via optimized CFG implementation.
  • Demonstrated heterogeneous hybrid training of Aquila2-70B-Expr across NVIDIA and Iluvatar chips.

Maintenance & Community

  • Developed with backing from the Beijing Academy of Artificial Intelligence (BAAI) as part of the FlagAI-Open initiative.
  • Recent updates (v0.8.0, v0.6.5, v0.6.0) show active development with new features and vendor adaptations.

Licensing & Compatibility

  • Licensed under the Apache License (Version 2.0).
  • Contains third-party components under other open-source licenses; refer to the LICENSE file for details.

Limitations & Caveats

  • Some features like heterogeneous prefill-decoding disaggregation and DeepSeek-v3 distributed pre-training are noted as beta.
  • Patching and unpatching backend code is required for full integration, indicating potential complexity in setup and maintenance.
Health Check
Last Commit

20 hours ago

Responsiveness

1 day

Pull Requests (30d)
71
Issues (30d)
3
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
4 more.

S-LoRA by S-LoRA

0.2%
2k
System for scalable LoRA adapter serving
Created 1 year ago
Updated 1 year ago
Starred by François Chollet François Chollet(Author of Keras; Cofounder of Ndea, ARC Prize), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
13 more.

neon by NervanaSystems

0%
4k
Deep learning framework (discontinued)
Created 11 years ago
Updated 4 years ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.3%
9k
PyTorch training helper for distributed execution
Created 4 years ago
Updated 1 day ago
Feedback? Help us improve.