financial_evaluation_dataset by alipay

Financial LLM evaluation benchmark

Created 2 years ago

251 stars

Top 99.9% on SourcePulse

Project Summary

Fin-Eva Version 1.0 is a comprehensive Chinese financial domain evaluation dataset jointly developed by Ant Group and Shanghai University of Finance and Economics. It addresses the need for standardized, professional benchmarks to assess the capabilities of large language models (LLMs) in diverse financial scenarios like wealth management, insurance, and investment research, as well as core financial, economic, and accounting knowledge. The dataset benefits researchers and practitioners by providing a robust tool for evaluating and improving financial AI performance.

How It Works

The dataset comprises over 13,000 single-choice questions, meticulously constructed from Ant Group's business data and public sources, alongside authoritative exam materials from Shanghai University of Finance and Economics. Ant's contribution focuses on five core LLM capabilities: financial cognition, knowledge, logic, content generation, and security compliance, spanning 33 sub-dimensions. SUFE's data covers four academic domains: finance, economics, accounting, and professional certificates. Data is split into white-box development sets (with answers) and black-box test sets (without answers) to ensure evaluation fairness. Prompt templates are provided to optimize model interaction and output standardization.

Quick Start & Requirements

Primary install/run command: Navigate to the run_scripts directory and execute shell scripts like bash run.sh for general evaluation or bash run_chatglm2.sh for specific models.
Prerequisites: For ChatGPT evaluation, an OPENAI_API_KEY must be configured in src/utils/gpt_utils.py. The example.py script demonstrates data loading.
Links:
- Code Repository: (Implicitly the source of this README)
- Community Contribution: Fin-eval@antgroup.com, zhang.liwen@shufe.edu.cn

Highlighted Details

Total 13k+ evaluation questions covering 5 Ant capabilities and 4 SUFE domains.
Ant's data includes 8,445 questions across 33 sub-dimensions, while SUFE contributes 4,661 questions.
Includes specific prompt templates tailored for different financial tasks and models.
Data is structured into dev (white-box) and test (black-box) sets for comprehensive evaluation.

Maintenance & Community

Fin-Eva is an open-source initiative by Ant Group and Shanghai University of Finance and Economics. Community contributions are actively encouraged to enhance the dataset's professionalism, distinctiveness, and breadth. Interested parties can contact Fin-eval@antgroup.com and zhang.liwen@shufe.edu.cn, requiring a signed FIN-EVA dataset contributor license agreement.

Licensing & Compatibility

The Fin-Eva dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits commercial use, sharing, and adaptation, provided appropriate credit is given to the original authors.

Limitations & Caveats

The dataset is primarily focused on the Chinese language. While version 1.0 is released, it represents an initial offering, and ongoing iteration is planned. Specific details on performance benchmarks against leading models or known limitations beyond language focus are not explicitly detailed in the provided README.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days