sqlite-vss  by asg017

SQLite extension for efficient vector search

created 2 years ago
1,878 stars

Top 23.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a SQLite extension for efficient vector similarity search, leveraging the Faiss library. It enables developers to build applications like semantic search engines, recommendation systems, and Q&A tools directly within SQLite, offering a convenient way to integrate vector capabilities into existing databases.

How It Works

The extension introduces a vss0 virtual table module, mirroring the API of SQLite's FTS5 extension. Users create virtual tables to store vector embeddings, which can be inserted as JSON or raw bytes. Similarity searches are performed using the vss_search function in the WHERE clause, returning the k-nearest neighbors. Custom Faiss factory strings can be specified per column for index optimization, such as using an inverted file index (IVF) for faster queries on large datasets, though this requires a training step.

Quick Start & Requirements

  • Install: Pre-built binaries are available for Linux x86_64 and macOS x86_64. Language-specific bindings are available via pip install sqlite-vss (Python), npm install sqlite-vss (Node.js), and others.
  • Prerequisites: On Linux, libgomp1, libatlas-base-dev, and liblapack-dev are required. For the SQLite CLI, vector0 extension is a dependency.
  • Links: Official Docs, Python Bindings, Node.js Bindings, Datasette Plugin.

Highlighted Details

  • Integrates Faiss for efficient Approximate Nearest Neighbor (ANN) search within SQLite.
  • Supports custom Faiss index configurations via factory strings for performance tuning.
  • Offers bindings for multiple languages including Python, Node.js, Go, Rust, and Deno.
  • Allows batch inserts/deletes, but UPDATE operations on virtual tables are not supported.

Maintenance & Community

The project is noted as "not in active development," with efforts redirected to sqlite-vec. The primary contributor is Alex (asg017). Community links are not explicitly provided in the README.

Licensing & Compatibility

The project appears to be distributed under the MIT license, allowing for commercial use and integration with closed-source applications.

Limitations & Caveats

Faiss indices are capped at 1GB. Additional filtering on KNN searches is not yet supported. Only CPU-based Faiss indices are supported, not GPU. Indices must fit in RAM as mmap'ed indices are not supported. UPDATE statements on vss0 tables are not supported.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
46 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

pgvector-node by pgvector

0.8%
399
Node.js library for pgvector support
created 4 years ago
updated 2 weeks ago
Feedback? Help us improve.