pgvector  by pgvector

Open-source vector similarity search extension for Postgres

created 4 years ago
16,820 stars

Top 2.8% on sourcepulse

GitHubView on GitHub
Project Summary

pgvector is an open-source PostgreSQL extension that enables efficient vector similarity search directly within the database. It allows users to store and query high-dimensional vectors alongside their relational data, supporting various distance metrics and indexing strategies for both exact and approximate nearest neighbor searches. This makes it ideal for applications like recommendation systems, semantic search, and anomaly detection that require integrating vector embeddings into existing PostgreSQL workflows.

How It Works

pgvector implements vector storage and indexing as a PostgreSQL extension, leveraging the database's ACID compliance, replication, and JOIN capabilities. It supports multiple vector types (full-precision, half-precision, binary, sparse) and distance metrics (L2, inner product, cosine, L1, Hamming, Jaccard). For efficient querying, it offers two primary index types: HNSW (Hierarchical Navigable Small World) for faster approximate nearest neighbor (ANN) search with a good speed-recall tradeoff, and IVFFlat for ANN search with faster build times and lower memory usage, partitioning vectors into lists.

Quick Start & Requirements

  • Installation: Compile from source (make && sudo make install), or use package managers (Docker, Homebrew, APT, Yum, conda-forge, pkg).
  • Prerequisites: PostgreSQL 13+ (or specific versions for certain installation methods). C++ support for Windows compilation.
  • Setup: Enable the extension with CREATE EXTENSION vector;.
  • Resources: Official documentation and installation guides are available.

Highlighted Details

  • Supports exact and approximate nearest neighbor search.
  • Offers multiple vector types: vector, halfvec, bit, sparsevec.
  • Implements HNSW and IVFFlat indexing strategies.
  • Integrates seamlessly with PostgreSQL features like ACID, WAL, and JOINs.
  • Provides extensive language support via various client libraries.

Maintenance & Community

The project is actively maintained, with contributions from various individuals and organizations. Resources for contributors include extension building infrastructure and interface definitions.

Licensing & Compatibility

pgvector is released under the PostgreSQL License, which is permissive and allows for commercial use and integration with closed-source applications.

Limitations & Caveats

While pgvector offers robust functionality, users should be aware that ANN indexes trade some recall for speed, meaning results might not always be perfectly accurate. Index build times, especially for HNSW, can be significant and are influenced by maintenance_work_mem and max_parallel_maintenance_workers. Query performance with filtering can be improved using iterative index scans, but requires careful tuning of parameters like hnsw.ef_search and ivfflat.probes.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
9
Star History
1,484 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

pgvector-node by pgvector

0.8%
399
Node.js library for pgvector support
created 4 years ago
updated 2 weeks ago
Feedback? Help us improve.