OCR library using Llama 3.2 Vision
Top 19.8% on sourcepulse
This npm library provides free Optical Character Recognition (OCR) for documents, converting images into Markdown format using the Llama 3.2 Vision model. It's designed for developers and users needing to extract text from images without incurring API costs, leveraging Together AI's free endpoint.
How It Works
The library utilizes the Llama 3.2 Vision model via Together AI's API to process image files. It parses the visual content of the image and returns the extracted text structured as Markdown. Users can opt for faster performance or higher rate limits by specifying paid Llama 3.2 11B or 90B Vision models.
Quick Start & Requirements
npm i llama-ocr
Highlighted Details
Maintenance & Community
The project is inspired by Zerox. Further community or roadmap details are not provided in the README.
Licensing & Compatibility
The license is not specified in the README. Compatibility for commercial use or closed-source linking is not detailed.
Limitations & Caveats
Currently, only local image OCR is supported; PDF support is planned but not yet implemented. The default model is Llama-3.2-90B-Vision, but performance and rate limits may vary based on the chosen Together AI endpoint.
6 months ago
1 day