Chinese eval benchmark for language models' knowledge/reasoning
Top 46.0% on sourcepulse
CMMLU is a comprehensive benchmark designed to evaluate the knowledge and reasoning capabilities of language models specifically within the Chinese language context. It covers 67 diverse subjects, ranging from fundamental sciences to advanced professional fields, including China-specific topics and common sense knowledge. The benchmark is intended for researchers and developers working on or evaluating Chinese language models.
How It Works
CMMLU presents a series of multiple-choice questions, each with four options and a single correct answer. The dataset is structured into development and testing subsets for each of the 67 topics. The evaluation methodology supports both zero-shot and few-shot (specifically five-shot) learning scenarios, allowing for a nuanced assessment of model performance under different prompting conditions.
Quick Start & Requirements
lm-evaluation-harness
and OpenCompass
.src/mp_utils
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
7 months ago
1 day