Discover and explore top open-source AI tools and projects—updated daily.
alipayFinancial LLM evaluation benchmark
Top 100.0% on SourcePulse
Fin-Eva Version 1.0 is a comprehensive Chinese financial domain evaluation dataset jointly developed by Ant Group and Shanghai University of Finance and Economics. It addresses the need for standardized, professional benchmarks to assess the capabilities of large language models (LLMs) in diverse financial scenarios like wealth management, insurance, and investment research, as well as core financial, economic, and accounting knowledge. The dataset benefits researchers and practitioners by providing a robust tool for evaluating and improving financial AI performance.
How It Works
The dataset comprises over 13,000 single-choice questions, meticulously constructed from Ant Group's business data and public sources, alongside authoritative exam materials from Shanghai University of Finance and Economics. Ant's contribution focuses on five core LLM capabilities: financial cognition, knowledge, logic, content generation, and security compliance, spanning 33 sub-dimensions. SUFE's data covers four academic domains: finance, economics, accounting, and professional certificates. Data is split into white-box development sets (with answers) and black-box test sets (without answers) to ensure evaluation fairness. Prompt templates are provided to optimize model interaction and output standardization.
Quick Start & Requirements
run_scripts directory and execute shell scripts like bash run.sh for general evaluation or bash run_chatglm2.sh for specific models.OPENAI_API_KEY must be configured in src/utils/gpt_utils.py. The example.py script demonstrates data loading.Fin-eval@antgroup.com, zhang.liwen@shufe.edu.cnHighlighted Details
dev (white-box) and test (black-box) sets for comprehensive evaluation.Maintenance & Community
Fin-Eva is an open-source initiative by Ant Group and Shanghai University of Finance and Economics. Community contributions are actively encouraged to enhance the dataset's professionalism, distinctiveness, and breadth. Interested parties can contact Fin-eval@antgroup.com and zhang.liwen@shufe.edu.cn, requiring a signed FIN-EVA dataset contributor license agreement.
Licensing & Compatibility
The Fin-Eva dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits commercial use, sharing, and adaptation, provided appropriate credit is given to the original authors.
Limitations & Caveats
The dataset is primarily focused on the Chinese language. While version 1.0 is released, it represents an initial offering, and ongoing iteration is planned. Specific details on performance benchmarks against leading models or known limitations beyond language focus are not explicitly detailed in the provided README.
2 years ago
Inactive