llama-ocr  by Nutlope

OCR library using Llama 3.2 Vision

created 8 months ago
2,372 stars

Top 19.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This npm library provides free Optical Character Recognition (OCR) for documents, converting images into Markdown format using the Llama 3.2 Vision model. It's designed for developers and users needing to extract text from images without incurring API costs, leveraging Together AI's free endpoint.

How It Works

The library utilizes the Llama 3.2 Vision model via Together AI's API to process image files. It parses the visual content of the image and returns the extracted text structured as Markdown. Users can opt for faster performance or higher rate limits by specifying paid Llama 3.2 11B or 90B Vision models.

Quick Start & Requirements

  • Install: npm i llama-ocr
  • Prerequisites: Node.js, Together AI API key (for default model usage).
  • Demo: LlamaOCR.com

Highlighted Details

  • Leverages Llama 3.2 Vision for OCR.
  • Outputs Markdown format.
  • Supports local image OCR.
  • Offers options for free and paid Together AI models.

Maintenance & Community

The project is inspired by Zerox. Further community or roadmap details are not provided in the README.

Licensing & Compatibility

The license is not specified in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

Currently, only local image OCR is supported; PDF support is planned but not yet implemented. The default model is Llama-3.2-90B-Vision, but performance and rate limits may vary based on the chosen Together AI endpoint.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
111 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.