FlagScale  by FlagOpen

Large model toolkit for end-to-end management and scaling

created 1 year ago
331 stars

Top 83.8% on sourcepulse

GitHubView on GitHub
Project Summary

FlagScale is a comprehensive toolkit designed to streamline the entire lifecycle of large language models, from development to deployment. It targets researchers and engineers working with large models, offering a unified platform to maximize computational efficiency and enhance model performance across diverse hardware architectures.

How It Works

FlagScale integrates and extends popular open-source projects like Megatron-LM and vllm, providing a flexible, multi-backend mechanism. It supports heterogeneous parallelism, enabling training and inference across different chip architectures (e.g., NVIDIA, Iluvatar) within a single instance. This approach aims to simplify complex distributed setups and unlock performance gains by leveraging specialized hardware.

Quick Start & Requirements

  • Installation: Clone the repository, then run ./install-requirements.sh --env train and ./install-requirements.sh --env inference to set up conda environments. Custom extensions for vllm and Megatron-Energon may require additional pip install commands.
  • Prerequisites: NGC's PyTorch container is recommended. Specific model training/serving may require datasets in Megatron-LM format.
  • Configuration: Uses Hydra for configuration management with experiment-level and task-level YAML files.
  • Running Tasks: A unified runner (python run.py) handles training, inference, and serving via configuration files.
  • CLI: pip install . installs a CLI for one-click deployment (e.g., flagscale serve deepseek_r1).
  • Documentation: Refer to the Quick Start section in the README for detailed instructions.

Highlighted Details

  • Supports heterogeneous pre-training and decoding across different chips within a single instance (FlagCX beta).
  • Achieved State-of-the-Art (SOTA) results on the Infinity-MM dataset with LLaVA-OneVision.
  • Accelerated generation and understanding tasks for Emu3 via optimized CFG implementation.
  • Demonstrated heterogeneous hybrid training of Aquila2-70B-Expr across NVIDIA and Iluvatar chips.

Maintenance & Community

  • Developed with backing from the Beijing Academy of Artificial Intelligence (BAAI) as part of the FlagAI-Open initiative.
  • Recent updates (v0.8.0, v0.6.5, v0.6.0) show active development with new features and vendor adaptations.

Licensing & Compatibility

  • Licensed under the Apache License (Version 2.0).
  • Contains third-party components under other open-source licenses; refer to the LICENSE file for details.

Limitations & Caveats

  • Some features like heterogeneous prefill-decoding disaggregation and DeepSeek-v3 distributed pre-training are noted as beta.
  • Patching and unpatching backend code is required for full integration, indicating potential complexity in setup and maintenance.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
65
Issues (30d)
12
Star History
67 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 18 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.