llm-mistral-invoice-cpu  by katanaml

LLM data extraction on CPU

Created 1 year ago
269 stars

Top 95.5% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a method for extracting data from invoices using the Mistral Large Language Model (LLM) on a local CPU. It is designed for users who need to process invoice documents without relying on cloud-based services or powerful GPUs, offering a self-contained solution for automated data extraction.

How It Works

The system processes text-based PDF invoices by first converting them into vector embeddings using a FAISS index. Subsequently, the Mistral LLM is employed to query these embeddings and extract specific information, such as invoice numbers, based on natural language prompts. This approach allows for efficient retrieval and extraction of structured data from unstructured invoice documents.

Quick Start & Requirements

  • Install requirements: pip install -r requirements.txt
  • Download Mistral model (link provided in models/model_download.txt).
  • Place text PDF files in the data folder.
  • Ingest data: python ingest.py
  • Process data: python main.py "retrieve invoice number value"
  • Requires Python and a compatible Mistral model.

Highlighted Details

  • Enables LLM-based invoice data extraction on CPU.
  • Utilizes FAISS for efficient vector embedding storage and retrieval.
  • Supports processing of text-based PDF invoices.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels is provided in the README.

Licensing & Compatibility

The license is not specified in the README. Compatibility for commercial or closed-source use is not detailed.

Limitations & Caveats

Performance will be significantly impacted by CPU capabilities. The project focuses solely on text-based PDFs, and image-based invoices would require an additional OCR step. The README does not specify the exact Mistral model version or its licensing.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech), and
2 more.

llmparser by kyang6

0%
426
LLM tool for structured data extraction and classification
Created 2 years ago
Updated 2 years ago
Starred by Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
11 more.

datatrove by huggingface

0.9%
3k
Data processing library for large-scale text data
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.