Discover and explore top open-source AI tools and projects—updated daily.
duckdbVector similarity search extension for DuckDB
Top 99.6% on SourcePulse
This experimental DuckDB extension provides Vector Similarity Search (VSS) capabilities directly within DuckDB, enabling efficient nearest neighbor searches on vector data stored in FLOAT ARRAY columns. It targets data scientists and engineers seeking to integrate VSS into their analytical workflows without relying on separate vector databases, offering performance gains for similarity-based queries.
How It Works
The extension integrates the usearch library to implement Hierarchical Navigable Small Worlds (HNSW) indexes. These indexes are exposed as a custom index type within DuckDB, compatible with its fixed-size ARRAY data type (introduced in v0.10.0). Queries involving ordering by distance metrics (array_distance, array_cosine_distance, array_negative_inner_product) against indexed FLOAT arrays, combined with a LIMIT clause, are accelerated via an HNSW_INDEX_SCAN operation.
Quick Start & Requirements
make. The primary executable is ./build/release/duckdb, which includes the extension. The loadable extension binary is ./build/release/extension/vss/vss.duckdb_extension.Highlighted Details
Maintenance & Community
No specific details regarding contributors, sponsorships, community channels (e.g., Discord/Slack), or roadmaps are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state the license type or provide compatibility notes for commercial use.
Limitations & Caveats
Currently, only vectors consisting of FLOAT types are supported. The HNSW index must fit entirely in RAM. Deletions are marked rather than immediate, potentially impacting query quality and performance over time, necessitating manual re-compaction (PRAGMA hnsw_compact_index) or index re-creation. Index serialization/deserialization during database checkpoints and restarts can be time-consuming for large indexes.
2 months ago
Inactive
neondatabase
tensorchord
timescale
nmslib
pgvector