Discover and explore top open-source AI tools and projects—updated daily.
run-llamaAI agent document parsing benchmark
Top 64.7% on SourcePulse
ParseBench is a benchmark designed to evaluate the effectiveness of document parsing tools in converting PDFs into structured data usable by AI agents. It addresses the gap where traditional parsing evaluations focus on visual fidelity rather than the semantic and structural integrity required for autonomous decision-making. This benchmark is crucial for engineers and researchers building AI agents that rely on accurate document comprehension, offering a standardized method to compare and select optimal parsing solutions.
How It Works
The benchmark evaluates parsing tools across five critical capability dimensions: Tables, Charts, Content Faithfulness, Semantic Formatting, and Visual Grounding. Each dimension targets specific failure modes that commonly disrupt AI agent workflows, such as misinterpreting table headers, extracting incorrect chart data points, or suffering from omissions and hallucinations. By using deterministic, rule-based evaluation metrics, ParseBench provides objective scores that reflect the practical utility of parsed documents for downstream AI tasks, moving beyond subjective assessments.
Quick Start & Requirements
uv sync --extra runners. Run evaluations using uv run parse-bench run <pipeline_name>, with a quick test option uv run parse-bench run llamaparse_agentic --test..env file at the project root containing API keys for the specific parsing tool being evaluated (e.g., LLAMA_CLOUD_API_KEY, OPENAI_API_KEY).docs/pipelines.mdHighlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (like Discord or Slack), or ongoing development signals were present in the provided README.
Licensing & Compatibility
The license type and any compatibility notes for commercial use or closed-source linking were not specified in the provided README.
Limitations & Caveats
The benchmark focuses on specific functional aspects critical for AI agents; other document parsing nuances may not be covered. Evaluation is deterministic and rule-based, not relying on LLM-as-a-judge. Running evaluations requires obtaining and configuring API keys for the tools under test.
1 week ago
Inactive