vault-ai  by pashpashpash

RAG app for custom knowledge base Q&A

created 2 years ago
3,366 stars

Top 14.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a self-hosted question-answering system that leverages OpenAI and Pinecone to give ChatGPT long-term memory. It allows users to upload custom knowledge bases from various document types (PDF, DOCX, EPUB, TXT) via a React frontend and receive context-aware answers, including source file and section information.

How It Works

The system utilizes the "OP Stack" (OpenAI + Pinecone). A Go backend handles file uploads, extracts text, chunks it, generates embeddings using OpenAI's API, and stores these embeddings with metadata (filename, text snippet) in Pinecone. When a question is asked, the backend generates an embedding for the query, retrieves relevant chunks from Pinecone, and constructs a prompt for OpenAI, combining the retrieved context with the original question to generate an answer.

Quick Start & Requirements

  • Install: npm install (then npm start for server, npm run dev for webpack)
  • Prerequisites: Node.js v19, Go v1.18.9, Poppler (for PDF text extraction).
  • API Keys: secret/openai_api_key, secret/pinecone_api_key, secret/pinecone_api_endpoint.
  • Pinecone Index: Vector size 1536, default settings.
  • Setup: Requires manual installation of dependencies and configuration of API keys. Estimated setup time is under 30 minutes for users familiar with the stack.
  • Docs: Announcement on X

Highlighted Details

  • Supports PDF, .txt, .rtf, .docx, .epub file types.
  • Provides source filename and specific context snippets with answers.
  • Maximum total upload size of 300 MB (configurable).
  • Uses go tiktoken for prompt token count estimation.

Maintenance & Community

The project appears to be a personal project by pashpashpash. No specific community channels or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project's nature suggests it is intended for personal use or integration into other projects. Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

New Pinecone free tier users may encounter namespace restrictions. The maximum individual file upload size is currently set to 3MB, though this is configurable. The project is presented as a personal endeavor without explicit community support or a formal roadmap.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
39 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.