AI-Bank-Statement-Document-Automation-By-LLM-And-Personal-Finanical-Analysis-Prediction  by johnsonhk88

Automated financial document analysis

Created 1 year ago
325 stars

Top 83.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses the challenge of automating the processing of bank statement PDFs, extracting financial data, and enabling natural language querying for personal financial analysis and planning. It targets individuals or businesses dealing with monthly bank statements, aiming to save time and provide structured financial insights.

How It Works

The project employs a multi-stage AI approach. It begins with Unstructured Document Preprocessing, utilizing OCR, computer vision (specifically a custom-trained YOLO model for document layout detection), and vision transformers to parse complex PDF layouts, including tables and charts. Extracted components are then processed by different AI models for context analysis. Embeddings are generated and stored in a Vector Database for efficient retrieval. Finally, Retrieval Augmented Generation (RAG) with LangChain and local LLMs (like Llama 3 or Gemma 2) is used to answer natural language queries based on the retrieved data, with plans for LLM evaluation using truLens or W&B.

Quick Start & Requirements

  • Installation: Install dependencies via requirements.txt. A venv is recommended.
  • Prerequisites: Google API key (added to .env), pytesseract (install via install-pytesseract-for-linux.sh on Ubuntu).
  • Running:
    • Development: jupyter notebook in src/dev/.
    • GUI: streamlit run apps.py.
  • Links: yolo-base-doc-layout-detection

Highlighted Details

  • Leverages a custom-trained YOLO model for document layout detection as a foundational step.
  • Employs advanced RAG techniques for improved retrieval accuracy in summarization tasks.
  • Prioritizes local, offline LLM inference for enhanced privacy and control.
  • Includes LLM evaluation frameworks (truLens, W&B) for performance monitoring.

Maintenance & Community

The project appears to be a personal endeavor by johnsonhk88. No specific community channels or active contributor information are detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project mentions "no experience to build rule to group data" and "no experience to identify common denominators and create headers," indicating potential limitations in the data structuring and summarization phases. The GUI is currently a first version using Streamlit, with a full-stack backend API planned for later versions.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
171 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech), and
2 more.

llmparser by kyang6

0%
426
LLM tool for structured data extraction and classification
Created 2 years ago
Updated 2 years ago
Feedback? Help us improve.