Discover and explore top open-source AI tools and projects—updated daily.
LLM data extraction on CPU
Top 95.5% on SourcePulse
This project provides a method for extracting data from invoices using the Mistral Large Language Model (LLM) on a local CPU. It is designed for users who need to process invoice documents without relying on cloud-based services or powerful GPUs, offering a self-contained solution for automated data extraction.
How It Works
The system processes text-based PDF invoices by first converting them into vector embeddings using a FAISS index. Subsequently, the Mistral LLM is employed to query these embeddings and extract specific information, such as invoice numbers, based on natural language prompts. This approach allows for efficient retrieval and extraction of structured data from unstructured invoice documents.
Quick Start & Requirements
pip install -r requirements.txt
models/model_download.txt
).data
folder.python ingest.py
python main.py "retrieve invoice number value"
Highlighted Details
Maintenance & Community
No specific information on contributors, sponsorships, or community channels is provided in the README.
Licensing & Compatibility
The license is not specified in the README. Compatibility for commercial or closed-source use is not detailed.
Limitations & Caveats
Performance will be significantly impacted by CPU capabilities. The project focuses solely on text-based PDFs, and image-based invoices would require an additional OCR step. The README does not specify the exact Mistral model version or its licensing.
1 year ago
Inactive