Evaluation framework for assessing LLMs using Chinese GAOKAO (college entrance exam) questions
Top 51.2% on sourcepulse
GAOKAO-Bench provides a standardized framework for evaluating large language models using China's Gaokao (National College Entrance Examination) questions. It aims to comprehensively assess models' language understanding and logical reasoning capabilities, offering a robust benchmark for the LLM community.
How It Works
The framework leverages a curated dataset of 2811 Gaokao questions (2010-2022), comprising 1781 objective and 1030 subjective questions. Objective questions are evaluated using rule-based answer extraction, while subjective questions are assessed through human grading or an LLM-as-a-Judge approach (specifically using GPT-4-turbo). This dual approach allows for a nuanced evaluation of both factual recall and complex reasoning.
Quick Start & Requirements
pip install -r requirements.txt
python objective_bench.py --openai_api_key="your openai api key"
python subjective_bench.py --openai_api_key="your openai api key"
Highlighted Details
Maintenance & Community
The project is associated with OpenLMLab and includes contributions from researchers at Shanghai CaoYang No.2 High School for subjective question scoring. Further updates are available via GAOKAO-Bench-Updates.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
Subjective question grading relies on human evaluators or LLM-as-a-Judge, which can introduce variability. The dataset is primarily focused on Chinese Gaokao questions, potentially limiting direct applicability to other educational systems.
6 months ago
1 week