pgvector by pgvector

Open-source vector similarity search extension for Postgres

Created 4 years ago

19,189 stars

Top 2.4% on SourcePulse

View on GitHub

29 Experts Love This Project

Taranjeet Singh

Cofounder of Mem0

Samuel Colvin

Founder and Author of Pydantic

Tim J. Baek

Founder of Open WebUI

Elvis Saravia

Founder of DAIR.AI

and 25 more!

Project Summary

pgvector is an open-source PostgreSQL extension that enables efficient vector similarity search directly within the database. It allows users to store and query high-dimensional vectors alongside their relational data, supporting various distance metrics and indexing strategies for both exact and approximate nearest neighbor searches. This makes it ideal for applications like recommendation systems, semantic search, and anomaly detection that require integrating vector embeddings into existing PostgreSQL workflows.

How It Works

pgvector implements vector storage and indexing as a PostgreSQL extension, leveraging the database's ACID compliance, replication, and JOIN capabilities. It supports multiple vector types (full-precision, half-precision, binary, sparse) and distance metrics (L2, inner product, cosine, L1, Hamming, Jaccard). For efficient querying, it offers two primary index types: HNSW (Hierarchical Navigable Small World) for faster approximate nearest neighbor (ANN) search with a good speed-recall tradeoff, and IVFFlat for ANN search with faster build times and lower memory usage, partitioning vectors into lists.

Quick Start & Requirements

Installation: Compile from source (make && sudo make install), or use package managers (Docker, Homebrew, APT, Yum, conda-forge, pkg).
Prerequisites: PostgreSQL 13+ (or specific versions for certain installation methods). C++ support for Windows compilation.
Setup: Enable the extension with CREATE EXTENSION vector;.
Resources: Official documentation and installation guides are available.

Highlighted Details

Supports exact and approximate nearest neighbor search.
Offers multiple vector types: vector, halfvec, bit, sparsevec.
Implements HNSW and IVFFlat indexing strategies.
Integrates seamlessly with PostgreSQL features like ACID, WAL, and JOINs.
Provides extensive language support via various client libraries.

Maintenance & Community

The project is actively maintained, with contributions from various individuals and organizations. Resources for contributors include extension building infrastructure and interface definitions.

Licensing & Compatibility

pgvector is released under the PostgreSQL License, which is permissive and allows for commercial use and integration with closed-source applications.

Limitations & Caveats

While pgvector offers robust functionality, users should be aware that ANN indexes trade some recall for speed, meaning results might not always be perfectly accurate. Index build times, especially for HNSW, can be significant and are influenced by maintenance_work_mem and max_parallel_maintenance_workers. Query performance with filtering can be improved using iterative index scans, but requires careful tuning of parameters like hnsw.ef_search and ivfflat.probes.

Health Check

Last Commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

455 stars in the last 30 days