chatgpt-pgvector by gannonh

Domain-specific chat completions app

Created 2 years ago

936 stars

Top 39.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Jared Palmer

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

Andrew Kane

Author of pgvector

Project Summary

This project provides a starter application for building domain-specific chatbots using OpenAI's GPT models and pgvector for efficient similarity search. It targets developers looking to augment LLMs with custom knowledge bases, enabling more accurate and verifiable responses by grounding them in provided documents.

How It Works

The application scrapes web pages, extracts plain text, and segments it into manageable chunks. Each chunk is then converted into a vector embedding using OpenAI's text-embedding-ada-002 model. These embeddings, along with the original text and source URL, are stored in a Supabase PostgreSQL database with the pgvector extension. When a user queries the system, their prompt is also embedded, and a similarity search is performed against the vector database to retrieve the most relevant document chunks. These chunks are then incorporated into a prompt sent to the OpenAI Chat Completions API, ensuring the LLM's response is informed by the domain-specific data.

Quick Start & Requirements

Install: npm install
Run: npm run dev
Prerequisites: Node.js, Supabase account, OpenAI API key. Requires a Supabase project created after February 2023 to ensure pgvector support.
Setup: Clone repo, install dependencies, configure .env.local with Supabase URL, Supabase Anon Key, and OpenAI API Key.
Links: Supabase, OpenAI API Keys

Highlighted Details

Leverages Next.js for a React-based frontend and Vercel hosting.
Utilizes Supabase's managed PostgreSQL with the pgvector extension for vector storage and search.
Implements a streaming response for a more interactive user experience.
Supports custom OpenAI API proxies and JavaScript rendering services (Splash).

Maintenance & Community

The repository is maintained by gannonh. No specific community channels or roadmap links are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

The project is presented as a starter app, implying it may require further development for production readiness. The dependency on Supabase and OpenAI APIs means costs associated with their usage. The effectiveness is highly dependent on the quality of scraped data and the chosen embedding model.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days