Open-source vector similarity search extension for Postgres
Top 2.8% on sourcepulse
pgvector is an open-source PostgreSQL extension that enables efficient vector similarity search directly within the database. It allows users to store and query high-dimensional vectors alongside their relational data, supporting various distance metrics and indexing strategies for both exact and approximate nearest neighbor searches. This makes it ideal for applications like recommendation systems, semantic search, and anomaly detection that require integrating vector embeddings into existing PostgreSQL workflows.
How It Works
pgvector implements vector storage and indexing as a PostgreSQL extension, leveraging the database's ACID compliance, replication, and JOIN capabilities. It supports multiple vector types (full-precision, half-precision, binary, sparse) and distance metrics (L2, inner product, cosine, L1, Hamming, Jaccard). For efficient querying, it offers two primary index types: HNSW (Hierarchical Navigable Small World) for faster approximate nearest neighbor (ANN) search with a good speed-recall tradeoff, and IVFFlat for ANN search with faster build times and lower memory usage, partitioning vectors into lists.
Quick Start & Requirements
make && sudo make install
), or use package managers (Docker, Homebrew, APT, Yum, conda-forge, pkg).CREATE EXTENSION vector;
.Highlighted Details
vector
, halfvec
, bit
, sparsevec
.Maintenance & Community
The project is actively maintained, with contributions from various individuals and organizations. Resources for contributors include extension building infrastructure and interface definitions.
Licensing & Compatibility
pgvector is released under the PostgreSQL License, which is permissive and allows for commercial use and integration with closed-source applications.
Limitations & Caveats
While pgvector offers robust functionality, users should be aware that ANN indexes trade some recall for speed, meaning results might not always be perfectly accurate. Index build times, especially for HNSW, can be significant and are influenced by maintenance_work_mem
and max_parallel_maintenance_workers
. Query performance with filtering can be improved using iterative index scans, but requires careful tuning of parameters like hnsw.ef_search
and ivfflat.probes
.
1 day ago
Inactive