benchmark  by getomni-ai

OCR benchmark for multimodal models

created 7 months ago
537 stars

Top 59.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides an open-source benchmark for evaluating Optical Character Recognition (OCR) and data extraction capabilities of Large Multimodal Models (LMMs) and traditional OCR providers. It targets researchers and developers aiming to compare model performance on document processing tasks, particularly JSON extraction accuracy, with transparent methodologies and datasets.

How It Works

The benchmark follows a Document ⇒ OCR ⇒ Extraction pipeline. It measures how well models can perform OCR on documents and then extract structured data (specifically JSON) from the OCR'd text. Evaluation uses a modified JSON diff for accuracy, calculating it as 1 - (difference fields / total fields), and also includes Levenshtein distance for text similarity, though this metric is noted to be sensitive to minor layout variations.

Quick Start & Requirements

  • Install dependencies: npm install
  • Prepare data: Add local files to data/ or set DATABASE_URL in .env.
  • Configure models: Copy models.example.yaml to models.yaml and set API keys in .env.
  • Run benchmark: npm run benchmark
  • Prerequisites: Node.js, API keys for tested models (OpenAI, Anthropic, Gemini, Mistral, OmniAI, ZeroX), and potentially Google Cloud credentials for Document AI.
  • Results: Saved to results/<timestamp>/results.json.
  • Documentation: Benchmark Dashboard

Highlighted Details

  • Evaluates both closed-source (GPT-4o, Gemini, Claude 3.5 Sonnet) and open-source LLMs (Gemma 3, Qwen 2.5, Llama 3.2).
  • Includes traditional cloud OCR providers (AWS Textract, Azure Document Intelligence, Google Document AI, Unstructured).
  • Focuses on JSON extraction accuracy as the primary metric.
  • Supports direct image extraction capabilities of models.

Maintenance & Community

The project is maintained by OmniAI. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The Levenshtein distance metric is sensitive to minor text layout changes, potentially penalizing accurate extractions that don't match ground truth formatting precisely. JSON extraction is not supported for all listed open-source LLMs or cloud OCR providers.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
74 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Paul Copplestone Paul Copplestone(Cofounder of Supabase), and
2 more.

MegaParse by QuivrHQ

0.5%
7k
File parser optimized for LLM ingestion
created 1 year ago
updated 5 months ago
Feedback? Help us improve.