ai-pdf-chatbot-langchain  by mayooear

AI chatbot agent for PDF document Q&A using LangChain & LangGraph

created 2 years ago
15,736 stars

Top 3.1% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a customizable template for an AI PDF chatbot agent, built with LangChain and LangGraph. It allows users to ingest PDF documents, store their embeddings in a vector database (Supabase), and query them using an LLM like OpenAI. The target audience includes developers and researchers looking to build document-aware AI applications.

How It Works

The system utilizes a two-graph architecture orchestrated by LangGraph. The Ingestion Graph parses PDFs, generates vector embeddings, and stores them in Supabase. The Retrieval Graph handles user queries, retrieves relevant document chunks from Supabase based on semantic similarity, and generates responses using an LLM, providing citations. This state-machine approach enables clear workflow visualization and debugging.

Quick Start & Requirements

  • Install: Clone the repo, run yarn install from the root.
  • Prerequisites: Node.js v18+ (v20 recommended), Yarn, Supabase project (with documents table and match_documents function), OpenAI API Key, optional LangChain API Key.
  • Setup: Configure .env files for backend and frontend with API keys and Supabase details.
  • Run: Start backend with cd backend && yarn langgraph:dev, start frontend with cd frontend && yarn dev.
  • Docs: LangChain Supabase Integration

Highlighted Details

  • LangGraph integration for state machine orchestration and workflow visualization.
  • Next.js frontend with real-time chat and SSE streaming responses.
  • Supabase used as the vector store for document embeddings.
  • Customizable prompts and retrieval logic.

Maintenance & Community

The project is associated with the O'Reilly book "Learning LangChain". Contributions are welcome via pull requests.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires clarification for commercial use or closed-source linking.

Limitations & Caveats

The default file upload limit is 5 files, each under 10MB, configurable in the ingest route. Chat history is not persistent across sessions in the provided UI. The license is not specified, which may impact commercial adoption.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
353 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

chatgpt-pgvector by gannonh

0%
938
Domain-specific chat completions app
created 2 years ago
updated 2 years ago
Feedback? Help us improve.