FlagEval  by flageval-baai

Evaluation toolkit for large AI foundation models

created 2 years ago
338 stars

Top 82.6% on sourcepulse

GitHubView on GitHub
Project Summary

FlagEval is an open-source toolkit and platform for evaluating large AI foundation models, pre-training, and fine-tuning algorithms across NLP, CV, Audio, and Multimodal domains. It provides scientific, impartial benchmarks and tools for researchers to thoroughly assess model effectiveness, with a focus on enhancing objectivity and efficiency in evaluation processes.

How It Works

FlagEval offers a modular approach with distinct sub-projects like mCLIPEval for vision-language models, ImageEval-prompt for fine-grained text-to-image model evaluation, and C-SEM for assessing semantic understanding in large models. This structure allows for specialized evaluation across diverse AI modalities and tasks, utilizing curated datasets and detailed annotation methodologies.

Quick Start & Requirements

  • mCLIPEval: git clone https://github.com/FlagOpen/FlagEval.git, cd FlagEval/mCLIPEval/, pip install -r requirements.txt. Requires PyTorch >= 1.8.0, Python >= 3.8, CUDA, and NCCL for GPU evaluation.
  • ImageEval-prompt: Refer to imageEval/README.md.
  • C-SEM: Refer to csem/README.md.
  • Official Website: flageval.baai.ac.cn

Highlighted Details

  • mCLIPEval supports multilingual (12 languages) and monolingual datasets for zero-shot classification, retrieval, and composition tasks.
  • ImageEval-prompt includes 1,624 English and 339 Chinese prompts, annotated across entity, style, and detail dimensions.
  • C-SEM evaluates semantic understanding at lexical and sentence levels with four sub-evaluation items (LLSRC, SLSRC, SLPWC, SLRFC).
  • FlagEval aims to integrate new versions and enhance Chinese language capabilities on its platform.

Maintenance & Community

  • Contact: flageval@baai.ac.cn for issues, bugs, or contributions.
  • Encourages new task, dataset, or tool submissions.
  • Hiring for foundation model evaluation roles.

Licensing & Compatibility

  • Majority licensed under Apache 2.0.
  • mCLIPEval (CLIP_benchmark) uses MIT license.
  • ImageNet1k datasets use Hugging Face Datasets and ImageNet licenses.
  • Generally compatible with commercial use, but specific dataset licenses should be reviewed.

Limitations & Caveats

The project is modular, with specific requirements and setup instructions varying per sub-project. While the core is Apache 2.0, other components have different licenses, requiring careful review for specific use cases.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.