llama-ocr by Nutlope

OCR library using Llama 3.2 Vision

Created 1 year ago

2,424 stars

Top 18.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This npm library provides free Optical Character Recognition (OCR) for documents, converting images into Markdown format using the Llama 3.2 Vision model. It's designed for developers and users needing to extract text from images without incurring API costs, leveraging Together AI's free endpoint.

How It Works

The library utilizes the Llama 3.2 Vision model via Together AI's API to process image files. It parses the visual content of the image and returns the extracted text structured as Markdown. Users can opt for faster performance or higher rate limits by specifying paid Llama 3.2 11B or 90B Vision models.

Quick Start & Requirements

Install: npm i llama-ocr
Prerequisites: Node.js, Together AI API key (for default model usage).
Demo: LlamaOCR.com

Highlighted Details

Leverages Llama 3.2 Vision for OCR.
Outputs Markdown format.
Supports local image OCR.
Offers options for free and paid Together AI models.

Maintenance & Community

The project is inspired by Zerox. Further community or roadmap details are not provided in the README.

Licensing & Compatibility

The license is not specified in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

Currently, only local image OCR is supported; PDF support is planned but not yet implemented. The default model is Llama-3.2-90B-Vision, but performance and rate limits may vary based on the chosen Together AI endpoint.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days