ask-my-pdf  by mobarski

Question answering system for PDF files

created 2 years ago
595 stars

Top 55.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a question-answering system for PDF files, specifically targeting board game rulebooks, built upon GPT-3. It's designed for avid board game fans and offers a proof-of-concept solution for quickly finding answers within complex documents.

How It Works

The system leverages a combination of In-Context Retrieval-Augmented Language Models (RALM) and Hypothetical Document Embeddings (HyDE). HyDE generates a hypothetical answer to a query, which is then embedded and used for retrieval. RALM uses these retrieved documents to augment the language model's context, enabling more accurate and relevant answers based on the PDF content.

Quick Start & Requirements

  • Install dependencies: pip install -r ask-my-pdf/requirements.txt
  • Run the app: cd ask-my-pdf/src then execute run.sh or run.bat.
  • Requires an OpenAI API key.
  • Configuration via environment variables (e.g., STORAGE_SALT, OPENAI_KEY).
  • Official demo available at https://ask-my-pdf.streamlit.app/.

Highlighted Details

  • Implements RALM and HyDE academic papers for retrieval-augmented generation.
  • Supports various storage and caching backends (S3, Redis, local filesystem).
  • Configurable via environment variables for flexible deployment.
  • Proof-of-concept, potentially containing bugs or unfinished features.

Maintenance & Community

The project is maintained by mobarski, who encourages following on Twitter for updates. It is presented as a proof of concept.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This is a proof-of-concept system and may contain bugs or unfinished features. The accuracy of answers is dependent on the quality of the PDF and the OpenAI model's performance, with potential for hallucinations.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.1%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 1 day ago
Feedback? Help us improve.