Discover and explore top open-source AI tools and projects—updated daily.
patronus-aiBenchmark for financial question answering with LLMs
Top 98.5% on SourcePulse
A new benchmark suite, FinanceBench, addresses the critical need for evaluating Large Language Models (LLMs) in open-book financial question answering. It targets researchers and engineers developing AI solutions for the financial sector, offering a standardized method to assess LLM capabilities and identify current limitations, thereby guiding future development and adoption decisions.
How It Works
FinanceBench comprises 10,231 ecologically valid financial questions, complete with human-annotated answers and evidence strings, designed to establish a minimum performance standard for LLMs. The repository provides an open-source sample of 150 annotated examples, alongside two JSONL files detailing questions and document metadata. These can be loaded and joined using Python's pandas library, facilitating the evaluation of LLM performance on real-world financial queries.
Quick Start & Requirements
import pandas as pd
df_questions = pd.read_json("data/financebench_open_source.jsonl", lines=True)
df_meta = pd.read_json("data/financebench_document_information.jsonl", lines=True)
df_full = pd.merge(df_questions, df_meta, on="doc_name")
/pdfs/, and model evaluation results are in /results/.Highlighted Details
Maintenance & Community
contact@patronus.ai.Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive
princeton-nlp
rlancemartin
karpathy