receipt-ocr  by bhimrazy

Receipt OCR for structured data extraction and raw text retrieval

Created 2 years ago
348 stars

Top 79.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository provides an efficient OCR engine for receipt image processing, offering two distinct modules: one for structured data extraction using Large Language Models (LLMs) and another for raw text extraction via Tesseract OCR. It caters to developers and users needing to automate receipt data capture, providing flexibility and powerful parsing capabilities for various applications.

How It Works

The project features a dual-module architecture. The receipt_ocr module leverages LLMs (supporting OpenAI, Gemini, and Groq) to parse receipt images and extract structured data such as merchant name, date, total amount, and line items. The tesseract_ocr module provides raw text extraction using the Tesseract engine. This approach allows users to choose between high-level, intelligent data parsing or low-level text retrieval, with both modules accessible via CLI, programmatic API, and Dockerized services.

Quick Start & Requirements

Install via pip: pip install receipt-ocr. Set your LLM API key (e.g., export OPENAI_API_KEY="your_openai_api_key_here"). Process receipts using the CLI: receipt-ocr images/receipt.jpg. Prerequisites include Python 3.x, Docker & Docker-compose (for services), and Tesseract OCR (for local CLI usage). Links to LLM provider API key pages are provided within the documentation.

Highlighted Details

  • Supports multiple LLM providers (OpenAI, Gemini, Groq) with configurable models and base URLs for flexible integration.
  • Offers both a command-line interface (CLI) and a programmatic Python API for streamlined receipt processing.
  • Includes production-ready FastAPI web services accessible via Docker Compose for both LLM and Tesseract modules.
  • The LLM module supports flexible response format types, including json_object, json_schema, and text, enhancing compatibility.

Maintenance & Community

The project encourages community engagement through GitHub Discussions and issue reporting for support and bug tracking. Specific details on maintainers, sponsorships, or a public roadmap are not detailed in the README.

Licensing & Compatibility

The project is licensed under the MIT license, which is permissive and generally suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The Tesseract OCR module's performance is sensitive to image quality, requiring well-lit receipts with clear edges. Structured data extraction accuracy depends on the chosen LLM provider and the clarity of the receipt content.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
117 stars in the last 30 days

Explore Similar Projects

Starred by Tom Preston-Werner Tom Preston-Werner(Cofounder of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
21 more.

markitdown by microsoft

4.1%
151k
Python tool for converting files to Markdown for LLM text analysis
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.