GPT-InvestAR  by UditGupta10

Tool for stock investment strategy via LLM analysis of annual reports

created 1 year ago
259 stars

Top 98.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides tools for enhancing stock investment strategies by analyzing company annual reports using Large Language Models (LLMs). It targets quantitative analysts, researchers, and investors seeking to leverage AI for financial data processing and predictive modeling, aiming to improve portfolio performance against benchmarks like the S&P 500.

How It Works

The project follows a pipeline: downloading 10-K filings from the SEC, converting them to PDF for token efficiency, generating embeddings using ChromaDB, and then querying these embeddings with an LLM (like GPT-3.5) to extract scores as features. These features are used in a Linear Regression model within a Jupyter Notebook to predict stock returns and construct investment portfolios.

Quick Start & Requirements

  • Install: Recommended to install Llama Index and OpenBB in separate virtual environments. Specific installation commands are not provided, but dependencies include Llama Index, OpenBB, Scikit-Learn, and PDFKit.
  • Prerequisites: Access to SEC filings, LLM API keys (e.g., GPT-3.5), and potentially significant computational resources for embedding generation and modeling.
  • Resources: No specific setup time or resource footprint is detailed.
  • Links: arXiv Link, SSRN link

Highlighted Details

  • Automates the extraction of financial insights from 10-K filings.
  • Leverages LLM-generated embeddings and query scores as predictive features.
  • Implements a modeling pipeline for return estimation and portfolio construction.
  • Compares portfolio performance against the S&P 500 index.

Maintenance & Community

The project is associated with a published paper, indicating academic backing. No specific community channels (Discord, Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

The repository's code is not explicitly licensed. The associated paper is available under a Creative Commons license (implied by arXiv). Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies on external LLM APIs, which may incur costs and have usage limitations. The effectiveness of the predictive model is dependent on the quality of LLM embeddings and the chosen features, and no performance benchmarks are provided. The setup process requires managing multiple complex dependencies in separate environments.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.