reader  by intergalacticalvariable

Web content processor for LLM pipelines

Created 1 year ago
256 stars

Top 98.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This project adapts Jina AI's Reader for local deployment using Docker, enabling users to convert any URL into LLM-friendly input. It offers a cost-free, privacy-preserving solution for enhancing agent and RAG system data preparation by processing web content locally without requiring API keys.

How It Works

The tool operates as a local web service accessible via http://127.0.0.1:3000/. By prefixing a target URL with this local endpoint, the service fetches and processes the webpage. It can output content in LLM-optimized formats such as Markdown, HTML, or plain text, and also generate local screenshot URLs. This architecture bypasses external dependencies and cloud storage, making it ideal for sensitive data or offline environments.

Quick Start & Requirements

Deployment is primarily via Docker. Users can pull the pre-built image (ghcr.io/intergalacticalvariable/reader:latest) or build it locally from the GitHub repository. The service runs on port 3000 and requires a local volume mount for screenshot storage (-v /path/to/local-storage:/app/local-storage). Minimal hardware is sufficient, with a demo running on 0.5 GB RAM and 1 vCore.

  • Install/Run: docker run -d -p 3000:3000 -v /path/to/local-storage:/app/local-storage --name reader-container ghcr.io/intergalacticalvariable/reader:latest
  • Prerequisites: Docker.
  • Links: GitHub repository (implied by build instructions).

Highlighted Details

  • Fully local deployment via Docker, eliminating cloud dependencies.
  • No API keys are necessary for operation.
  • Screenshots are saved locally, with URLs provided for access.
  • Supports multiple output formats: Markdown, HTML, Text, Screen-Size Screenshot, and Full-Page Screenshot.

Maintenance & Community

The project acknowledges Jina AI and Harsh Gupta as foundational contributors. No specific community channels (e.g., Discord, Slack), roadmap links, or active maintainer details are provided in the README.

Licensing & Compatibility

Licensed under the Apache-2.0 license, consistent with the original Jina AI Reader project. This license is permissive and generally compatible with commercial use and integration into closed-source applications.

Limitations & Caveats

The current version does not support parsing PDF documents.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

mcp-server-cloudflare by cloudflare

0.8%
3k
MCP servers for LLM integration with Cloudflare services
Created 11 months ago
Updated 2 days ago
Feedback? Help us improve.