jvector  by datastax

Embedded vector search engine

Created 2 years ago
1,685 stars

Top 24.8% on SourcePulse

GitHubView on GitHub
Project Summary

JVector is an advanced, embedded vector search engine designed for developers and researchers needing efficient approximate nearest neighbor (ANN) search. It offers a flexible, graph-based indexing approach that merges DiskANN and HNSW techniques, enabling fast, accurate, and scalable vector similarity searches, particularly for high-dimensional data.

How It Works

JVector implements a multi-layer graph index, leveraging Vamana (from DiskANN) within each layer, inspired by HNSW's hierarchical structure. It supports non-blocking concurrency for scalable index construction. The design features an in-memory adjacency list for upper layers and an on-disk adjacency list for the bottom layer. It utilizes two-pass search with optional vector compression (Product Quantization, Binary Quantization, Fused ADC) for reduced memory usage and latency while preserving accuracy. A key innovation is its ability to build larger-than-memory indexes using two-pass searches during construction.

Quick Start & Requirements

  • Install/Run: Primarily a Java library. Examples can be run via Maven: mvn compile exec:exec@bench or mvn compile exec:exec@sift.
  • Prerequisites: Java 11+ required. Java 20+ recommended for optimized vector providers (SIMD via Panama Vector API).
  • Resources: Benchmarks suggest memory bandwidth saturation can occur; a PhysicalCoreExecutor is used by default to limit operations to physical core count, configurable via -Djvector.physical_core_count.
  • Links: Examples, SiftSmall, Bench

Highlighted Details

  • Merges HNSW and DiskANN (Vamana) for a hybrid graph index.
  • Supports incremental index construction and in-place deletes.
  • Offers Product Quantization (PQ), Binary Quantization (BQ), and Fused ADC for compression.
  • Two-pass search with optional reranking for accuracy and performance.
  • Capable of building larger-than-memory indexes.
  • Leverages Java's Panama Vector API (SIMD) for performance.

Maintenance & Community

  • Developed by DataStax.
  • Multi-module Maven build, targeting Java 11 compatibility with Java 20+ optimizations.
  • Community and support channels are not explicitly mentioned in the README.

Licensing & Compatibility

  • Apache License 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • Anisotropic PQ tuning is experimental and can degrade performance if misconfigured.
  • SimpleMappedReader for on-disk indexes is limited to 2GB file sizes; MemorySegmentReader requires Java 22+.
  • The README mentions potential memory bandwidth saturation during indexing and PQ, managed by PhysicalCoreExecutor.
Health Check
Last Commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
28
Issues (30d)
12
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Bryan Helmig Bryan Helmig(Cofounder of Zapier) and Jared Palmer Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX).

pgvector-node by pgvector

0%
430
Node.js library for pgvector support
Created 4 years ago
Updated 1 month ago
Feedback? Help us improve.