doc-solver by ai-hermes

ChatGPT chatbot for PDF documents

Created 2 years ago

332 stars

Top 82.7% on SourcePulse

Project Summary

This project provides a Next.js-based chatbot that allows users to query their PDF documents using GPT-4 via LangChain and Pinecone. It's designed for developers and researchers who need to build custom Q&A systems over private document sets. The primary benefit is enabling conversational access to information contained within PDF files.

How It Works

The application leverages LangChain for orchestrating LLM interactions and document processing. PDFs are converted into text, chunked, and then embedded using OpenAI's models. These embeddings, along with the original text chunks, are stored in Pinecone, a vector database, for efficient similarity search. When a user asks a question, it's embedded and used to retrieve the most relevant text chunks from Pinecone, which are then passed to GPT-4 along with the question to generate an answer.

Quick Start & Requirements

Install: yarn install
Prerequisites: Node.js version >= 18, Yarn, OpenAI API key, Pinecone API key, environment, and index name. Vector dimensions for Pinecone must be set to 1536.
Setup: Requires configuring API keys and Pinecone details in a .env file. Ingesting documents involves placing PDFs in the docs folder and running yarn run ingest.
Run: npm run dev
Demo: https://docsolver.spotty.com.cn/

Highlighted Details

Utilizes GPT-4 API for enhanced response quality.
Supports multiple PDF files for ingestion.
Customizable QA prompt within utils/makechain.ts.
Frontend inspired by langchain-chat-nextjs.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

Licensed under the Apache License, Copyright © 2021-present doc-solver. This license is permissive and generally compatible with commercial use.

Limitations & Caveats

The application requires access to the GPT-4 API, and failure to have access will prevent it from working. PDFs that are scanned or require OCR may not be processed correctly without pre-conversion to text. Pinecone starter plan indexes are deleted after 7 days of inactivity, requiring potential re-ingestion.

doc-solver by ai-hermes

Explore Similar Projects

ai-chatbot-svelte by vercel

openchat-monorepo by akazwz

docGPT-langchain by Lin-jun-xiang

ChatLLM by yuanjie-ai

doc-chatbot by dissorial

LLM-Zero-to-Hundred by Farzad-R

chat-your-data by hwchase17

ChatFiles by guangzhengli

OpenChat by openchatai

companion-app by a16z-infra

ai-pdf-chatbot-langchain by mayooear

chatgpt-on-wechat by zhayujie