Discover and explore top open-source AI tools and projects—updated daily.
Automated financial document analysis
Top 83.6% on SourcePulse
This project addresses the challenge of automating the processing of bank statement PDFs, extracting financial data, and enabling natural language querying for personal financial analysis and planning. It targets individuals or businesses dealing with monthly bank statements, aiming to save time and provide structured financial insights.
How It Works
The project employs a multi-stage AI approach. It begins with Unstructured Document Preprocessing, utilizing OCR, computer vision (specifically a custom-trained YOLO model for document layout detection), and vision transformers to parse complex PDF layouts, including tables and charts. Extracted components are then processed by different AI models for context analysis. Embeddings are generated and stored in a Vector Database for efficient retrieval. Finally, Retrieval Augmented Generation (RAG) with LangChain and local LLMs (like Llama 3 or Gemma 2) is used to answer natural language queries based on the retrieved data, with plans for LLM evaluation using truLens or W&B.
Quick Start & Requirements
requirements.txt
. A venv
is recommended..env
), pytesseract
(install via install-pytesseract-for-linux.sh
on Ubuntu).jupyter notebook
in src/dev/
.streamlit run apps.py
.Highlighted Details
Maintenance & Community
The project appears to be a personal endeavor by johnsonhk88. No specific community channels or active contributor information are detailed in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project mentions "no experience to build rule to group data" and "no experience to identify common denominators and create headers," indicating potential limitations in the data structuring and summarization phases. The GUI is currently a first version using Streamlit, with a full-stack backend API planned for later versions.
2 months ago
Inactive