receipt-ocr by bhimrazy

Receipt OCR for structured data extraction and raw text retrieval

Created 2 years ago

368 stars

Top 76.4% on SourcePulse

Project Summary

Summary

This repository provides an efficient OCR engine for receipt image processing, offering two distinct modules: one for structured data extraction using Large Language Models (LLMs) and another for raw text extraction via Tesseract OCR. It caters to developers and users needing to automate receipt data capture, providing flexibility and powerful parsing capabilities for various applications.

How It Works

The project features a dual-module architecture. The receipt_ocr module leverages LLMs (supporting OpenAI, Gemini, and Groq) to parse receipt images and extract structured data such as merchant name, date, total amount, and line items. The tesseract_ocr module provides raw text extraction using the Tesseract engine. This approach allows users to choose between high-level, intelligent data parsing or low-level text retrieval, with both modules accessible via CLI, programmatic API, and Dockerized services.

Quick Start & Requirements

Install via pip: pip install receipt-ocr. Set your LLM API key (e.g., export OPENAI_API_KEY="your_openai_api_key_here"). Process receipts using the CLI: receipt-ocr images/receipt.jpg. Prerequisites include Python 3.x, Docker & Docker-compose (for services), and Tesseract OCR (for local CLI usage). Links to LLM provider API key pages are provided within the documentation.

Highlighted Details

Supports multiple LLM providers (OpenAI, Gemini, Groq) with configurable models and base URLs for flexible integration.
Offers both a command-line interface (CLI) and a programmatic Python API for streamlined receipt processing.
Includes production-ready FastAPI web services accessible via Docker Compose for both LLM and Tesseract modules.
The LLM module supports flexible response format types, including json_object, json_schema, and text, enhancing compatibility.

Maintenance & Community

The project encourages community engagement through GitHub Discussions and issue reporting for support and bug tracking. Specific details on maintainers, sponsorships, or a public roadmap are not detailed in the README.

Licensing & Compatibility

The project is licensed under the MIT license, which is permissive and generally suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The Tesseract OCR module's performance is sensitive to image quality, requiring well-lit receipts with clear edges. Structured data extraction accuracy depends on the chosen LLM provider and the clarity of the receipt content.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days