llama-scan  by ngafar

PDF to text transcription with local LLMs

Created 3 months ago
676 stars

Top 50.0% on SourcePulse

GitHubView on GitHub
Project Summary

This tool enables local PDF transcription and analysis using Ollama's multimodal LLMs, offering a cost-effective solution for extracting text and image descriptions from documents without relying on cloud services. It is designed for users who need to process sensitive or large PDF collections locally.

How It Works

The tool leverages Ollama to run large language models locally, processing PDF files page by page. It extracts text content and utilizes multimodal capabilities to generate detailed descriptions of images and diagrams within the PDFs, converting the entire document into a text-based format.

Quick Start & Requirements

  • Primary install / run command: pip install llama-scan or uv tool install llama-scan
  • Non-default prerequisites: Python 3.10+, Ollama installed and running locally.
  • Usage: llama-scan path/to/your/file.pdf
  • Documentation: [Not explicitly linked, but usage examples are provided in README]

Highlighted Details

  • Local processing eliminates token costs and enhances data privacy.
  • Supports the latest multimodal LLMs available through Ollama.
  • Capable of transcribing both text and image content from PDFs.
  • Offers options for specifying output directory, model, page range, and image resizing.

Maintenance & Community

  • Project maintained by ngafar.
  • No community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • License: Not specified in the README.
  • Compatibility: Designed for local execution, compatible with any system running Python 3.10+ and Ollama.

Limitations & Caveats

The tool's effectiveness is dependent on the performance and capabilities of the locally installed Ollama models. The README does not specify the license, which may impact commercial use.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
5
Star History
175 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Douwe Kiela Douwe Kiela(Cofounder of Contextual AI), and
1 more.

lens by ContextualAI

0.3%
353
Vision-language research paper using LLMs
Created 2 years ago
Updated 1 month ago
Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

gill by kohjingyu

0%
463
Multimodal LLM for generating/retrieving images and generating text
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.