FlagPerf  by FlagOpen

AI chip benchmark platform for evaluating hardware/software stacks

created 2 years ago
343 stars

Top 81.8% on sourcepulse

GitHubView on GitHub
Project Summary

FlagPerf is an open-source platform for benchmarking AI hardware, targeting AI researchers and hardware vendors. It establishes an industry-practice-oriented metric system to evaluate AI hardware's actual capabilities across various software stack combinations (model + framework + compiler), offering a more comprehensive assessment beyond just execution time.

How It Works

FlagPerf employs a multi-dimensional evaluation metric system that includes functional correctness, performance metrics, resource utilization, and ecosystem adaptability. It supports a wide range of AI hardware by integrating with diverse training frameworks (PyTorch, TensorFlow, PaddlePaddle, MindSpore) and inference engines (TensorRT, XTCL, IxRT). The platform is designed for flexibility, allowing for testing across single-card, single-node, and multi-node environments to simulate real-world application scenarios.

Quick Start & Requirements

  • Installation: Clone the repository and navigate to the relevant sub-directory (base, training, or inference).
  • Prerequisites: Python, Docker (optional but recommended), hardware drivers, network configuration, SSH trust between servers, and monitoring tools (sysstat, ipmitool). Specific model datasets and checkpoints are required for training and inference benchmarks.
  • Setup: Requires configuring host files (configs/host.yaml) and model-specific configurations. Detailed setup instructions and documentation links are available within the README.

Highlighted Details

  • Covers over 30 classic models across CV, NLP, Speech, and Multimodal domains, with 80+ training examples and various inference scenarios.
  • Actively integrates with major AI hardware vendors and framework developers for broad ecosystem support.
  • Ensures fair evaluation by restricting vendor optimizations to hardware-execution-related aspects like distributed communication and batch size.
  • All testing code is open-source and processes are reproducible.

Maintenance & Community

FlagPerf is a collaborative effort involving Zhipu AI and numerous AI hardware and framework teams. Recent updates include support for operator evaluation, containerized execution, and specific model pre-training (LLaMA3, Megatron-Llama). Contact is available via email at flagperf@baai.ac.cn or through GitHub issues.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license. Compatibility for commercial use or closed-source linking is generally permissive due to the Apache 2.0 license, but users should consult the documentation for specific model test case licensing.

Limitations & Caveats

Currently, FlagPerf focuses on offline batch processing and does not support cluster-level or client-side performance evaluation. The README notes that for transformer decoder models, parameter FLOPs calculation requires specific input length parameters.

Health Check
Last commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.