FlagEval  by flageval-baai

Evaluation toolkit for large AI foundation models

Created 2 years ago
337 stars

Top 81.6% on SourcePulse

GitHubView on GitHub
Project Summary

FlagEval is an open-source toolkit and platform for evaluating large AI foundation models, pre-training, and fine-tuning algorithms across NLP, CV, Audio, and Multimodal domains. It provides scientific, impartial benchmarks and tools for researchers to thoroughly assess model effectiveness, with a focus on enhancing objectivity and efficiency in evaluation processes.

How It Works

FlagEval offers a modular approach with distinct sub-projects like mCLIPEval for vision-language models, ImageEval-prompt for fine-grained text-to-image model evaluation, and C-SEM for assessing semantic understanding in large models. This structure allows for specialized evaluation across diverse AI modalities and tasks, utilizing curated datasets and detailed annotation methodologies.

Quick Start & Requirements

  • mCLIPEval: git clone https://github.com/FlagOpen/FlagEval.git, cd FlagEval/mCLIPEval/, pip install -r requirements.txt. Requires PyTorch >= 1.8.0, Python >= 3.8, CUDA, and NCCL for GPU evaluation.
  • ImageEval-prompt: Refer to imageEval/README.md.
  • C-SEM: Refer to csem/README.md.
  • Official Website: flageval.baai.ac.cn

Highlighted Details

  • mCLIPEval supports multilingual (12 languages) and monolingual datasets for zero-shot classification, retrieval, and composition tasks.
  • ImageEval-prompt includes 1,624 English and 339 Chinese prompts, annotated across entity, style, and detail dimensions.
  • C-SEM evaluates semantic understanding at lexical and sentence levels with four sub-evaluation items (LLSRC, SLSRC, SLPWC, SLRFC).
  • FlagEval aims to integrate new versions and enhance Chinese language capabilities on its platform.

Maintenance & Community

  • Contact: flageval@baai.ac.cn for issues, bugs, or contributions.
  • Encourages new task, dataset, or tool submissions.
  • Hiring for foundation model evaluation roles.

Licensing & Compatibility

  • Majority licensed under Apache 2.0.
  • mCLIPEval (CLIP_benchmark) uses MIT license.
  • ImageNet1k datasets use Hugging Face Datasets and ImageNet licenses.
  • Generally compatible with commercial use, but specific dataset licenses should be reviewed.

Limitations & Caveats

The project is modular, with specific requirements and setup instructions varying per sub-project. While the core is Apache 2.0, other components have different licenses, requiring careful review for specific use cases.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
12 more.

gpt-3 by openai

0.0%
16k
Research paper on large language model few-shot learning
Created 5 years ago
Updated 5 years ago
Feedback? Help us improve.