Chinese eval suite for foundation models (NeurIPS 2023)
Top 24.9% on sourcepulse
C-Eval is a comprehensive Chinese evaluation suite designed for assessing the capabilities of foundation models across 52 diverse disciplines. It comprises 13,948 multi-choice questions categorized by difficulty and subject area, aiming to help developers track progress and identify model strengths and weaknesses in Chinese language understanding and reasoning.
How It Works
The suite provides multi-choice questions in various subjects, including STEM, Social Science, Humanities, and Others. It supports both zero-shot and few-shot evaluation methodologies. For few-shot evaluation, a "dev" split with explanations is available to guide models. The "val" split is intended for hyperparameter tuning, while the "test" split's labels are withheld, requiring submission for automatic evaluation.
Quick Start & Requirements
ceval-exam.zip
from Hugging Face or load via datasets.load_dataset("ceval/ceval-exam")
.lm-evaluation-harness
framework (e.g., python main.py --model hf-causal --model_args pretrained=EleutherAI/gpt-j-6B --tasks Ceval-valid-computer_network --device cuda:0
).pandas
and datasets
.Highlighted Details
lm-evaluation-harness
.Maintenance & Community
The project is associated with HKUST NLP. Further community interaction details are not explicitly provided in the README.
Licensing & Compatibility
Limitations & Caveats
The dataset license prohibits commercial use. Test set labels are not released, necessitating submission to the platform for evaluation.
6 days ago
1 week