pdfGPT by bhaskatripathi

PDF chatbot for interacting with PDF content

Created 3 years ago

7,171 stars

Top 7.2% on SourcePulse

View on GitHub

2 Experts Love This Project

Clarence Chio

Cofounder of Coverbase, Unit21

Jasper Zhang

Cofounder of Hyperbolic

Project Summary

This project provides an open-source solution for interacting with PDF documents using GPT capabilities, enabling users to "chat" with their PDFs. It's designed for users who need accurate, citation-backed answers from documents without relying on complex third-party RAG frameworks.

How It Works

pdfGPT employs a unique, lightweight RAG approach without vector databases or indexing. It breaks down PDF content into smaller chunks, generates embeddings using a Deep Averaging Network Encoder, and performs semantic search using KNN to retrieve the most relevant chunks. These chunks are then passed to OpenAI's GPT models, with custom logic ensuring precise responses that can include page number citations.

Quick Start & Requirements

Install/Run: docker-compose -f docker-compose.yaml up
Prerequisites: OpenAI API Key.
Demo: https://huggingface.co/spaces/bhaskartripathi/pdfChatter

Highlighted Details

Claims to be one of the most accurate RAG solutions due to its simple architecture.
Supports GPT-4 (16K/32K tokens) and Turbo models.
Includes chat history and pre-defined questions.
Responses can cite page numbers.
Developed in 2021 as an early RAG solution.

Maintenance & Community

The project is seeking contributors for backlog items and joint maintenance.

Licensing & Compatibility

License: MIT License.
Compatible with commercial use.

Limitations & Caveats

The project's documentation is noted as outdated. The accuracy of Turbo models for Q&A is questioned, recommending GPT-4 or text-DaVinci-003 for better results in specific cases. Future releases are planned to include support for multiple PDFs, OCR, and a Node.js web application.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days