jvector  by datastax

Embedded vector search engine

created 1 year ago
1,620 stars

Top 26.5% on sourcepulse

GitHubView on GitHub
Project Summary

JVector is an advanced, embedded vector search engine designed for developers and researchers needing efficient approximate nearest neighbor (ANN) search. It offers a flexible, graph-based indexing approach that merges DiskANN and HNSW techniques, enabling fast, accurate, and scalable vector similarity searches, particularly for high-dimensional data.

How It Works

JVector implements a multi-layer graph index, leveraging Vamana (from DiskANN) within each layer, inspired by HNSW's hierarchical structure. It supports non-blocking concurrency for scalable index construction. The design features an in-memory adjacency list for upper layers and an on-disk adjacency list for the bottom layer. It utilizes two-pass search with optional vector compression (Product Quantization, Binary Quantization, Fused ADC) for reduced memory usage and latency while preserving accuracy. A key innovation is its ability to build larger-than-memory indexes using two-pass searches during construction.

Quick Start & Requirements

  • Install/Run: Primarily a Java library. Examples can be run via Maven: mvn compile exec:exec@bench or mvn compile exec:exec@sift.
  • Prerequisites: Java 11+ required. Java 20+ recommended for optimized vector providers (SIMD via Panama Vector API).
  • Resources: Benchmarks suggest memory bandwidth saturation can occur; a PhysicalCoreExecutor is used by default to limit operations to physical core count, configurable via -Djvector.physical_core_count.
  • Links: Examples, SiftSmall, Bench

Highlighted Details

  • Merges HNSW and DiskANN (Vamana) for a hybrid graph index.
  • Supports incremental index construction and in-place deletes.
  • Offers Product Quantization (PQ), Binary Quantization (BQ), and Fused ADC for compression.
  • Two-pass search with optional reranking for accuracy and performance.
  • Capable of building larger-than-memory indexes.
  • Leverages Java's Panama Vector API (SIMD) for performance.

Maintenance & Community

  • Developed by DataStax.
  • Multi-module Maven build, targeting Java 11 compatibility with Java 20+ optimizations.
  • Community and support channels are not explicitly mentioned in the README.

Licensing & Compatibility

  • Apache License 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • Anisotropic PQ tuning is experimental and can degrade performance if misconfigured.
  • SimpleMappedReader for on-disk indexes is limited to 2GB file sizes; MemorySegmentReader requires Java 22+.
  • The README mentions potential memory bandwidth saturation during indexing and PQ, managed by PhysicalCoreExecutor.
Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
11
Issues (30d)
2
Star History
39 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.