vault-ai by pashpashpash

RAG app for custom knowledge base Q&A

Created 2 years ago

3,401 stars

Top 14.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

This project provides a self-hosted question-answering system that leverages OpenAI and Pinecone to give ChatGPT long-term memory. It allows users to upload custom knowledge bases from various document types (PDF, DOCX, EPUB, TXT) via a React frontend and receive context-aware answers, including source file and section information.

How It Works

The system utilizes the "OP Stack" (OpenAI + Pinecone). A Go backend handles file uploads, extracts text, chunks it, generates embeddings using OpenAI's API, and stores these embeddings with metadata (filename, text snippet) in Pinecone. When a question is asked, the backend generates an embedding for the query, retrieves relevant chunks from Pinecone, and constructs a prompt for OpenAI, combining the retrieved context with the original question to generate an answer.

Quick Start & Requirements

Install: npm install (then npm start for server, npm run dev for webpack)
Prerequisites: Node.js v19, Go v1.18.9, Poppler (for PDF text extraction).
API Keys: secret/openai_api_key, secret/pinecone_api_key, secret/pinecone_api_endpoint.
Pinecone Index: Vector size 1536, default settings.
Setup: Requires manual installation of dependencies and configuration of API keys. Estimated setup time is under 30 minutes for users familiar with the stack.
Docs: Announcement on X

Highlighted Details

Supports PDF, .txt, .rtf, .docx, .epub file types.
Provides source filename and specific context snippets with answers.
Maximum total upload size of 300 MB (configurable).
Uses go tiktoken for prompt token count estimation.

Maintenance & Community

The project appears to be a personal project by pashpashpash. No specific community channels or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project's nature suggests it is intended for personal use or integration into other projects. Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

New Pinecone free tier users may encounter namespace restrictions. The maximum individual file upload size is currently set to 3MB, though this is configurable. The project is presented as a personal endeavor without explicit community support or a formal roadmap.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days