Financial dataset generator for LLM Q&A
Top 79.5% on sourcepulse
This Python library enables the creation of question-and-answer financial datasets from various text sources, including 10-K filings, PDFs, and general text documents, using Large Language Models (LLMs). It is designed for researchers and developers working with LLMs in the financial domain, aiming to simplify the generation of realistic, context-rich financial Q&A pairs.
How It Works
The library leverages LLMs, specifically mentioning gpt-4-turbo
, to process financial documents and extract relevant information. Users can provide raw text, a PDF URL, or a company ticker and year for 10-K filings. The DatasetGenerator
class then orchestrates the LLM calls to generate question-answer pairs, including the supporting context from the source material. This approach automates the laborious process of manual dataset creation for financial NLP tasks.
Quick Start & Requirements
pip install financial-datasets
Highlighted Details
max_questions
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The library relies on external LLM APIs, specifically mentioning OpenAI's gpt-4-turbo
, which requires an API key and incurs costs. The quality and accuracy of the generated datasets are dependent on the LLM's performance and the clarity of the input financial documents.
1 year ago
1 day