fully-local-pdf-chatbot by jacoblee93

Local PDF chatbot for document interaction

Created 2 years ago

1,811 stars

Top 23.6% on SourcePulse

View on GitHub

5 Experts Love This Project

Eric Ciarla

Cofounder of Firecrawl

Omar Sanseviero

DevRel at Google DeepMind

Philipp Schmid

DevRel at Google DeepMind

Jeffrey Morgan

Cofounder of Ollama

and 1 more!

Project Summary

This project provides a fully local, client-side chat-over-documents solution, targeting users who want to query PDFs without uploading data to external servers. It leverages WebAssembly and browser-based LLM inference for privacy and offline functionality.

How It Works

The application processes uploaded PDFs entirely within the browser. It chunks the document, creates vector embeddings using Transformers.js (or optionally Ollama), and stores them in a WASM-based vector store (Voy). Retrieval-Augmented Generation (RAG) is then performed using LangChain.js and LangGraph.js, interacting with a locally hosted LLM.

Quick Start & Requirements

In-browser (WebLLM): Upload PDF directly; model weights (Phi-3.5) download on first use (several GB).
Ollama: Requires Ollama desktop app and a running Mistral instance.
- OLLAMA_ORIGINS=https://webml-demo.vercel.app OLLAMA_HOST=127.0.0.1:11435 ollama serve
- OLLAMA_HOST=127.0.0.1:11435 ollama pull mistral
Gemini Nano: Requires Chrome early preview program.
Dependencies: Node.js (yarn).

Highlighted Details

Fully client-side RAG pipeline.
Supports multiple local LLM backends: Ollama, WebLLM, Gemini Nano.
Utilizes Voy (WASM vector store) and Transformers.js for embeddings.
Orchestration via LangChain.js and LangGraph.js.

Maintenance & Community

The project acknowledges contributors from Voy, Ollama, WebLLM, and Transformers.js. The author is active on Twitter (@Hacubu).

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The Gemini Nano integration is experimental and may yield variable results as the model is not chat-tuned. The project is a Next.js app, and deployment details beyond local setup are not provided.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days