Discover and explore top open-source AI tools and projects—updated daily.
SUFE-AIFLM-LabFinancial LLM evaluation benchmark
Top 99.3% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> FinEval is a pioneering Chinese benchmark dataset designed to comprehensively evaluate the professional capabilities and security of Large Language Models (LLMs) within the specialized and risk-sensitive financial industry. It addresses the uncertainty surrounding LLM performance in complex financial tasks by providing over 26,000 diverse question types. FinEval targets researchers and developers seeking to assess and improve LLMs for financial applications, offering a robust foundation for advancing AI in finance.
How It Works
FinEval employs a rigorous methodology, combining quantitative fundamental methods with extensive research, summarization, and manual screening. The benchmark encompasses six key financial domains: Academic Knowledge, Industry Knowledge, Security Knowledge, Financial Agent, Financial Multimodal Capabilities, and Financial Rigor Testing. It features a variety of question formats, including multiple-choice, short-answer, reasoning, and retrieval-based tasks. Evaluation utilizes zero-shot, few-shot, and Chain-of-Thought prompting strategies, with results presented via text and multimodal performance leaderboards.
Quick Start & Requirements
pip install -r requirements.txt.code/data directory.https://fineval.readthedocs.io/zh_CN/latest/. Paper: https://arxiv.org/abs/2308.09975. Hugging Face dataset: https://huggingface.co/datasets/SUFE-AIFLM-Lab/FinEval.Highlighted Details
Maintenance & Community
The provided README does not detail specific maintenance contributors, community channels (e.g., Discord, Slack), or active development roadmaps.
Licensing & Compatibility
Limitations & Caveats
7 months ago
1 week
AI4Finance-Foundation