FlagScale  by FlagOpen

Large model toolkit for end-to-end management and scaling

Created 2 years ago
398 stars

Top 72.5% on SourcePulse

GitHubView on GitHub
Project Summary

FlagScale is a comprehensive toolkit designed to streamline the entire lifecycle of large language models, from development to deployment. It targets researchers and engineers working with large models, offering a unified platform to maximize computational efficiency and enhance model performance across diverse hardware architectures.

How It Works

FlagScale integrates and extends popular open-source projects like Megatron-LM and vllm, providing a flexible, multi-backend mechanism. It supports heterogeneous parallelism, enabling training and inference across different chip architectures (e.g., NVIDIA, Iluvatar) within a single instance. This approach aims to simplify complex distributed setups and unlock performance gains by leveraging specialized hardware.

Quick Start & Requirements

  • Installation: Clone the repository, then run ./install-requirements.sh --env train and ./install-requirements.sh --env inference to set up conda environments. Custom extensions for vllm and Megatron-Energon may require additional pip install commands.
  • Prerequisites: NGC's PyTorch container is recommended. Specific model training/serving may require datasets in Megatron-LM format.
  • Configuration: Uses Hydra for configuration management with experiment-level and task-level YAML files.
  • Running Tasks: A unified runner (python run.py) handles training, inference, and serving via configuration files.
  • CLI: pip install . installs a CLI for one-click deployment (e.g., flagscale serve deepseek_r1).
  • Documentation: Refer to the Quick Start section in the README for detailed instructions.

Highlighted Details

  • Supports heterogeneous pre-training and decoding across different chips within a single instance (FlagCX beta).
  • Achieved State-of-the-Art (SOTA) results on the Infinity-MM dataset with LLaVA-OneVision.
  • Accelerated generation and understanding tasks for Emu3 via optimized CFG implementation.
  • Demonstrated heterogeneous hybrid training of Aquila2-70B-Expr across NVIDIA and Iluvatar chips.

Maintenance & Community

  • Developed with backing from the Beijing Academy of Artificial Intelligence (BAAI) as part of the FlagAI-Open initiative.
  • Recent updates (v0.8.0, v0.6.5, v0.6.0) show active development with new features and vendor adaptations.

Licensing & Compatibility

  • Licensed under the Apache License (Version 2.0).
  • Contains third-party components under other open-source licenses; refer to the LICENSE file for details.

Limitations & Caveats

  • Some features like heterogeneous prefill-decoding disaggregation and DeepSeek-v3 distributed pre-training are noted as beta.
  • Patching and unpatching backend code is required for full integration, indicating potential complexity in setup and maintenance.
Health Check
Last Commit

11 hours ago

Responsiveness

1 day

Pull Requests (30d)
52
Issues (30d)
1
Star History
43 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
4 more.

S-LoRA by S-LoRA

0.1%
2k
System for scalable LoRA adapter serving
Created 2 years ago
Updated 1 year ago
Starred by François Chollet François Chollet(Author of Keras; Cofounder of Ndea, ARC Prize), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
13 more.

neon by NervanaSystems

0%
4k
Deep learning framework (discontinued)
Created 11 years ago
Updated 4 years ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.2%
9k
PyTorch training helper for distributed execution
Created 5 years ago
Updated 1 week ago
Feedback? Help us improve.