pgvector-python  by pgvector

Python SDK for pgvector, enabling vector storage in Postgres

created 4 years ago
1,287 stars

Top 31.6% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides Python bindings for the pgvector PostgreSQL extension, enabling efficient storage and querying of vector embeddings directly within a PostgreSQL database. It targets Python developers working with machine learning, AI, and data science applications who need to integrate vector search capabilities into their existing PostgreSQL-based data pipelines. The primary benefit is leveraging PostgreSQL's robustness and familiarity for vector similarity search.

How It Works

The library translates Python objects and ORM calls into SQL queries that interact with the pgvector extension's custom data types (vector, halfvec, bit, sparsevec) and functions. It supports various distance metrics (L2, cosine, inner product) and indexing methods (HNSW, IVFFlat) for optimized nearest neighbor searches. This approach allows developers to manage vector data alongside relational data within a single database, simplifying architecture and data management.

Quick Start & Requirements

  • Install via pip: pip install pgvector
  • Requires PostgreSQL with the pgvector extension installed.
  • Supports Django, SQLAlchemy, SQLModel, Psycopg 3, Psycopg 2, asyncpg, pg8000, and Peewee.
  • Examples and detailed integration guides are available for various ORMs and use cases.

Highlighted Details

  • Comprehensive ORM support: Django, SQLAlchemy, SQLModel, Peewee.
  • Direct driver support: Psycopg 3, Psycopg 2, asyncpg, pg8000.
  • Supports vector, halfvec, bit, and sparsevec data types.
  • Implements L2, inner product, and cosine distance metrics.
  • Integrates with HNSW and IVFFlat approximate nearest neighbor indexes.
  • Includes examples for RAG, embeddings, hybrid search, and recommendations.

Maintenance & Community

The project is actively maintained by the pgvector team. Contributions are welcomed, with clear guidelines for reporting bugs, fixing issues, and adding features. Development setup instructions and testing commands are provided.

Licensing & Compatibility

The library is released under the MIT License, allowing for broad use, including commercial applications.

Limitations & Caveats

While the library offers extensive ORM and driver support, users must ensure the pgvector extension is correctly installed and enabled in their PostgreSQL instance. Performance for very large datasets will depend heavily on proper indexing and PostgreSQL configuration.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
1
Star History
101 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

pgvector-node by pgvector

0.8%
399
Node.js library for pgvector support
created 4 years ago
updated 2 weeks ago
Feedback? Help us improve.