benchmark  by getomni-ai

OCR benchmark for multimodal models

Created 10 months ago
587 stars

Top 55.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides an open-source benchmark for evaluating Optical Character Recognition (OCR) and data extraction capabilities of Large Multimodal Models (LMMs) and traditional OCR providers. It targets researchers and developers aiming to compare model performance on document processing tasks, particularly JSON extraction accuracy, with transparent methodologies and datasets.

How It Works

The benchmark follows a Document ⇒ OCR ⇒ Extraction pipeline. It measures how well models can perform OCR on documents and then extract structured data (specifically JSON) from the OCR'd text. Evaluation uses a modified JSON diff for accuracy, calculating it as 1 - (difference fields / total fields), and also includes Levenshtein distance for text similarity, though this metric is noted to be sensitive to minor layout variations.

Quick Start & Requirements

  • Install dependencies: npm install
  • Prepare data: Add local files to data/ or set DATABASE_URL in .env.
  • Configure models: Copy models.example.yaml to models.yaml and set API keys in .env.
  • Run benchmark: npm run benchmark
  • Prerequisites: Node.js, API keys for tested models (OpenAI, Anthropic, Gemini, Mistral, OmniAI, ZeroX), and potentially Google Cloud credentials for Document AI.
  • Results: Saved to results/<timestamp>/results.json.
  • Documentation: Benchmark Dashboard

Highlighted Details

  • Evaluates both closed-source (GPT-4o, Gemini, Claude 3.5 Sonnet) and open-source LLMs (Gemma 3, Qwen 2.5, Llama 3.2).
  • Includes traditional cloud OCR providers (AWS Textract, Azure Document Intelligence, Google Document AI, Unstructured).
  • Focuses on JSON extraction accuracy as the primary metric.
  • Supports direct image extraction capabilities of models.

Maintenance & Community

The project is maintained by OmniAI. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The Levenshtein distance metric is sensitive to minor text layout changes, potentially penalizing accurate extractions that don't match ground truth formatting precisely. JSON extraction is not supported for all listed open-source LLMs or cloud OCR providers.

Health Check
Last Commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.