Discover and explore top open-source AI tools and projects—updated daily.
run-llamaFast, local document parsing and screenshotting for AI
Top 14.0% on SourcePulse
A standalone, open-source document parser designed for fast, local processing. LiteParse offers high-quality spatial text extraction with bounding boxes, making it suitable for users who require document parsing without cloud dependencies or proprietary LLM features. It provides a flexible OCR system and supports multiple input formats, running entirely on the user's machine.
How It Works
LiteParse employs PDF.js for its core spatial text parsing capabilities, enabling precise text positioning. It includes a built-in, zero-setup OCR engine using Tesseract.js, with the flexibility to integrate external HTTP OCR servers like EasyOCR or PaddleOCR. The tool can also generate high-quality page screenshots, essential for LLM agents. Outputs are available in JSON or plain text formats, including bounding box data.
Quick Start & Requirements
npm i -g @llamaindex/liteparse. Alternatively, macOS/Linux users can use brew install llamaindex-liteparse.Highlighted Details
Maintenance & Community
No specific details regarding maintainers, sponsorships, or community channels (like Discord/Slack) were found in the provided README.
Licensing & Compatibility
The project is licensed under the Apache 2.0 license. This license is generally permissive and compatible with commercial use and linking within closed-source projects.
Limitations & Caveats
For highly complex documents such as dense tables, multi-column layouts, charts, handwritten text, or heavily scanned PDFs, the cloud-based LlamaParse service is recommended for significantly better results. Setup for multi-format parsing requires the installation of external dependencies like LibreOffice or ImageMagick.
2 days ago
Inactive