AI chip benchmark platform for evaluating hardware/software stacks
Top 81.8% on sourcepulse
FlagPerf is an open-source platform for benchmarking AI hardware, targeting AI researchers and hardware vendors. It establishes an industry-practice-oriented metric system to evaluate AI hardware's actual capabilities across various software stack combinations (model + framework + compiler), offering a more comprehensive assessment beyond just execution time.
How It Works
FlagPerf employs a multi-dimensional evaluation metric system that includes functional correctness, performance metrics, resource utilization, and ecosystem adaptability. It supports a wide range of AI hardware by integrating with diverse training frameworks (PyTorch, TensorFlow, PaddlePaddle, MindSpore) and inference engines (TensorRT, XTCL, IxRT). The platform is designed for flexibility, allowing for testing across single-card, single-node, and multi-node environments to simulate real-world application scenarios.
Quick Start & Requirements
base
, training
, or inference
).configs/host.yaml
) and model-specific configurations. Detailed setup instructions and documentation links are available within the README.Highlighted Details
Maintenance & Community
FlagPerf is a collaborative effort involving Zhipu AI and numerous AI hardware and framework teams. Recent updates include support for operator evaluation, containerized execution, and specific model pre-training (LLaMA3, Megatron-Llama). Contact is available via email at flagperf@baai.ac.cn or through GitHub issues.
Licensing & Compatibility
The project is licensed under the Apache 2.0 license. Compatibility for commercial use or closed-source linking is generally permissive due to the Apache 2.0 license, but users should consult the documentation for specific model test case licensing.
Limitations & Caveats
Currently, FlagPerf focuses on offline batch processing and does not support cluster-level or client-side performance evaluation. The README notes that for transformer decoder models, parameter FLOPs calculation requires specific input length parameters.
6 days ago
1 day