FlagEval by flageval-baai

Evaluation toolkit for large AI foundation models

Created 3 years ago

338 stars

Top 81.9% on SourcePulse

Project Summary

FlagEval is an open-source toolkit and platform for evaluating large AI foundation models, pre-training, and fine-tuning algorithms across NLP, CV, Audio, and Multimodal domains. It provides scientific, impartial benchmarks and tools for researchers to thoroughly assess model effectiveness, with a focus on enhancing objectivity and efficiency in evaluation processes.

How It Works

FlagEval offers a modular approach with distinct sub-projects like mCLIPEval for vision-language models, ImageEval-prompt for fine-grained text-to-image model evaluation, and C-SEM for assessing semantic understanding in large models. This structure allows for specialized evaluation across diverse AI modalities and tasks, utilizing curated datasets and detailed annotation methodologies.

Quick Start & Requirements

mCLIPEval: git clone https://github.com/FlagOpen/FlagEval.git, cd FlagEval/mCLIPEval/, pip install -r requirements.txt. Requires PyTorch >= 1.8.0, Python >= 3.8, CUDA, and NCCL for GPU evaluation.
ImageEval-prompt: Refer to imageEval/README.md.
C-SEM: Refer to csem/README.md.
Official Website: flageval.baai.ac.cn

Highlighted Details

mCLIPEval supports multilingual (12 languages) and monolingual datasets for zero-shot classification, retrieval, and composition tasks.
ImageEval-prompt includes 1,624 English and 339 Chinese prompts, annotated across entity, style, and detail dimensions.
C-SEM evaluates semantic understanding at lexical and sentence levels with four sub-evaluation items (LLSRC, SLSRC, SLPWC, SLRFC).
FlagEval aims to integrate new versions and enhance Chinese language capabilities on its platform.

Maintenance & Community

Contact: flageval@baai.ac.cn for issues, bugs, or contributions.
Encourages new task, dataset, or tool submissions.
Hiring for foundation model evaluation roles.

Licensing & Compatibility

Majority licensed under Apache 2.0.
mCLIPEval (CLIP_benchmark) uses MIT license.
ImageNet1k datasets use Hugging Face Datasets and ImageNet licenses.
Generally compatible with commercial use, but specific dataset licenses should be reviewed.

Limitations & Caveats

The project is modular, with specific requirements and setup instructions varying per sub-project. While the core is Apache 2.0, other components have different licenses, requiring careful review for specific use cases.

FlagEval by flageval-baai

Explore Similar Projects

cobra by h-zhao1997

Awesome-CV-Foundational-Models by awaisrauf

X-VLM by zengyan-97

Awesome-Foundation-Models by uncbiag

VLP by LuoweiZhou

finetune by IndicoDataSolutions

BiomedGPT by taokz

CLIP_benchmark by LAION-AI

lightNLP by smilelight

smollm by huggingface

gpt-3 by openai

unilm by microsoft