chatgpt-retrieval-plugin  by openai

Retrieval plugin for custom GPTs, function calling, or assistants APIs

Created 2 years ago
21,223 stars

Top 2.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a self-hosted, standalone retrieval backend for ChatGPT, enabling custom GPTs and API integrations to access personal or organizational documents via natural language queries. It offers granular control over document chunking, embedding models, and vector database choices, making it ideal for developers needing a flexible RAG solution beyond native file uploads.

How It Works

The plugin leverages OpenAI's embedding models to convert document chunks into vector representations, which are then stored and queried using a variety of backend vector databases. A FastAPI server exposes endpoints for document upserting, querying with metadata filtering, and deletion. This architecture allows for customizable retrieval pipelines, catering to specific accuracy, cost, and speed requirements.

Quick Start & Requirements

  • Install: Clone the repo, install poetry, create a virtual environment (poetry env use python3.10), and install dependencies (poetry install).
  • Prerequisites: Python 3.10, OpenAI API key, and credentials for a chosen vector database.
  • Setup: Set environment variables for DATASTORE, BEARER_TOKEN, OPENAI_API_KEY, and vector database specifics.
  • Run: poetry run start. API docs available at http://0.0.0.0:8000/docs.
  • Docs: https://github.com/openai/chatgpt-retrieval-plugin#quickstart

Highlighted Details

  • Supports numerous vector databases including Pinecone, Weaviate, Milvus, Qdrant, Redis, Elasticsearch, and more.
  • Offers a "Memory Feature" allowing ChatGPT to save conversation snippets back to the vector database.
  • Integrates with ChatGPT Custom GPTs via OpenAPI schema and with Chat Completions/Assistants APIs via function calling.
  • Provides scripts for batch processing documents from JSON, JSONL, and ZIP files, with optional PII detection and metadata extraction.

Maintenance & Community

The project has contributions from various individuals and organizations, with specific mentions for Pinecone, Weaviate, Zilliz, Milvus, Qdrant, Redis, LlamaIndex, Supabase, Postgres, and Elasticsearch integrations. Community contributions are encouraged, with potential for OpenAI credits.

Licensing & Compatibility

The repository is licensed under the MIT License, permitting commercial use and linking with closed-source applications.

Limitations & Caveats

Keyword search performance may vary, with some vector databases offering better hybrid search capabilities. Sensitive data handling is the developer's responsibility, and the accuracy of optional PII detection and metadata extraction is not guaranteed.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
27 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.