pgai  by timescale

SDK for building RAG and agent applications with PostgreSQL

Created 1 year ago
5,370 stars

Top 9.4% on SourcePulse

GitHubView on GitHub
Project Summary

pgai is a Python library designed to transform PostgreSQL into a robust retrieval engine for RAG and agentic applications. It simplifies the creation and synchronization of vector embeddings from PostgreSQL data and S3 documents, automatically updating them as the underlying data changes. This empowers developers to build AI applications with semantic search and retrieval capabilities directly within their PostgreSQL databases.

How It Works

pgai employs a declarative approach where users define a vectorizer configuration specifying data sources, chunking strategies, and embedding models. Stateless worker processes then read this configuration, queue data for embedding, and write the resulting embeddings and text chunks back to PostgreSQL. This architecture decouples embedding generation from core data operations, enhancing resilience against embedding service failures. It leverages pgvector for vector storage and search, and pgvectorscale for high-performance ANN search.

Quick Start & Requirements

  • Install via pip: pip install pgai
  • Prerequisites: PostgreSQL database (Docker instructions available), OpenAI API key (or other supported providers).
  • Setup requires creating a .env file with OPENAI_API_KEY and DB_URL.
  • Quickstart example code and requirements are available for download.
  • Official quickstart guides: OpenAI, Ollama, VoyageAI.

Highlighted Details

  • Automatically creates and synchronizes vector embeddings from PostgreSQL data and S3 documents.
  • Supports batch processing with built-in error handling for model failures, rate limits, and latency.
  • Integrates with pgvector and pgvectorscale for semantic and ANN search.
  • Offers a configurable pipeline for data loading, parsing, chunking, formatting, and embedding.
  • Supports multiple embedding providers including Ollama, OpenAI, Cohere, and Huggingface.

Maintenance & Community

  • Project is actively developed by Timescale.
  • Community support available via Discord.
  • Open to contributions; roadmap and discussions are available.
  • Discord

Licensing & Compatibility

  • Apache 2.0 License.
  • Compatible with commercial use and closed-source applications.

Limitations & Caveats

The project is described as being in an "early stage," indicating potential for rapid changes and evolving features. While designed for production, users should be aware of the implications of adopting a rapidly developing library.

Health Check
Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
7
Issues (30d)
4
Star History
238 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.