Document chatbot for multi-file Q&A using GPT
Top 42.6% on sourcepulse
This project provides a GPT-powered chatbot for interacting with multiple documents, supporting various file types and chat sessions. It's designed for users who need to query and discuss information contained within their own documents, leveraging LangChain and Pinecone for efficient retrieval and storage.
How It Works
The chatbot processes uploaded documents (.pdf, .docx, .txt) by converting them into embeddings, which are then stored in Pinecone namespaces. When a user asks a question, LangChain retrieves relevant document chunks from Pinecone based on semantic similarity and feeds them to GPT for generating an answer. This approach allows for context-aware responses derived directly from the provided documents.
Quick Start & Requirements
yarn install
.env
file configured with Pinecone API key, index name, and environment..env
, and run npm run dev
.Highlighted Details
mongodb-and-auth
) for Google authentication and MongoDB integration, though it's noted as being behind the main branch.Maintenance & Community
The project is a fork of mayooear/GPT-4-LangChain
with significant modifications. Frontend design is inspired by ChatGPT. No specific community channels or active maintainer information are provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Pinecone indexes on the Starter plan are deleted after 7 days of inactivity, requiring periodic API requests to prevent deletion. The project primarily uses local storage for chat history, with an older, less feature-complete branch available for authentication and database integration. File conversion issues may arise with scanned or OCR-requiring documents.
2 years ago
1 day