doc-chatbot by dissorial

Document chatbot for multi-file Q&A using GPT

Created 2 years ago

864 stars

Top 41.6% on SourcePulse

Project Summary

This project provides a GPT-powered chatbot for interacting with multiple documents, supporting various file types and chat sessions. It's designed for users who need to query and discuss information contained within their own documents, leveraging LangChain and Pinecone for efficient retrieval and storage.

How It Works

The chatbot processes uploaded documents (.pdf, .docx, .txt) by converting them into embeddings, which are then stored in Pinecone namespaces. When a user asks a question, LangChain retrieves relevant document chunks from Pinecone based on semantic similarity and feeds them to GPT for generating an answer. This approach allows for context-aware responses derived directly from the provided documents.

Quick Start & Requirements

Install: yarn install
Prerequisites: Node.js, Pinecone account and API key, .env file configured with Pinecone API key, index name, and environment.
Setup: Clone the repository, install dependencies, configure .env, and run npm run dev.
Docs: https://github.com/dissorial/doc-chatbot

Highlighted Details

Supports multiple topics, chat windows, and chat history via local storage.
Allows creation, deletion, and management of Pinecone namespaces directly from the browser.
Offers an alternative branch (mongodb-and-auth) for Google authentication and MongoDB integration, though it's noted as being behind the main branch.

Maintenance & Community

The project is a fork of mayooear/GPT-4-LangChain with significant modifications. Frontend design is inspired by ChatGPT. No specific community channels or active maintainer information are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Pinecone indexes on the Starter plan are deleted after 7 days of inactivity, requiring periodic API requests to prevent deletion. The project primarily uses local storage for chat history, with an older, less feature-complete branch available for authentication and database integration. File conversion issues may arise with scanned or OCR-requiring documents.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days