Evaluation toolkit for large AI foundation models
Top 82.6% on sourcepulse
FlagEval is an open-source toolkit and platform for evaluating large AI foundation models, pre-training, and fine-tuning algorithms across NLP, CV, Audio, and Multimodal domains. It provides scientific, impartial benchmarks and tools for researchers to thoroughly assess model effectiveness, with a focus on enhancing objectivity and efficiency in evaluation processes.
How It Works
FlagEval offers a modular approach with distinct sub-projects like mCLIPEval for vision-language models, ImageEval-prompt for fine-grained text-to-image model evaluation, and C-SEM for assessing semantic understanding in large models. This structure allows for specialized evaluation across diverse AI modalities and tasks, utilizing curated datasets and detailed annotation methodologies.
Quick Start & Requirements
git clone https://github.com/FlagOpen/FlagEval.git
, cd FlagEval/mCLIPEval/
, pip install -r requirements.txt
. Requires PyTorch >= 1.8.0, Python >= 3.8, CUDA, and NCCL for GPU evaluation.imageEval/README.md
.csem/README.md
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is modular, with specific requirements and setup instructions varying per sub-project. While the core is Apache 2.0, other components have different licenses, requiring careful review for specific use cases.
3 months ago
1 week