Discover and explore top open-source AI tools and projects—updated daily.
jerryjliuLocal document parsing and AI Q&A with visual source citations
New!
Top 77.8% on SourcePulse
Summary
This repository provides interactive demonstrations for LiteParse, a fast, local, and model-free document parsing engine developed by LlamaIndex. It targets engineers and researchers evaluating document processing solutions, offering benefits like direct parser comparisons, precise visual sourcing of extracted text, and AI-assisted querying with verifiable citations.
How It Works
LiteParse employs a model-free approach for rapid, local document analysis across various formats including PDF, DOCX, PPTX, XLSX, and images. Key innovations include side-by-side parser comparisons against established libraries like PyPDF and PyMuPDF, and a "Visual Citations" feature that enables exact keyword searches with bounding box overlays directly on source PDF pages. Additionally, a Claude Code Skill integrates LiteParse for AI-powered Q&A, generating reports with cited source pages.
Quick Start & Requirements
comparison/output/comparison.html, visual_citations/output/visual-citations.html). For custom data processing or regeneration, pip install -r requirements.txt is needed, followed by running Python scripts within respective directories (comparison/, visual_citations/). The Claude Code Skill installs via npx skills add run-llama/liteparse_samples --skill research_docs.requirements.txt (e.g., liteparse, pypdf, pymupdf).Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), sponsorships, or roadmap are present. The project is associated with LlamaIndex.
Licensing & Compatibility
The license type is not explicitly stated.
Limitations & Caveats
The "Visual Citations" search is a simple substring match, not supporting fuzzy matching or RAG. Other limitations, unsupported platforms, or known issues are not detailed.
1 day ago
Inactive
nlmatics