Neighbor is a Ruby gem providing efficient nearest neighbor search capabilities for Ruby on Rails applications. It integrates with various database extensions and types, enabling developers to implement similarity search for embeddings and other vector data directly within their ActiveRecord models.
How It Works
Neighbor leverages database-specific extensions like PostgreSQL's cube
and pgvector
, SQLite's sqlite-vec
, MariaDB, and MySQL to store and query vector data. It provides an ActiveRecord interface (has_neighbors
) that maps to these underlying database features, allowing users to define vector columns and perform nearest neighbor searches using various distance metrics (Euclidean, cosine, Hamming, etc.). The gem also supports indexing strategies like HNSW and IVFFlat for performance optimization.
Quick Start & Requirements
- Install via Bundler:
gem "neighbor"
- Requires a compatible database (Postgres, SQLite, MariaDB, MySQL) and relevant extensions/types.
- Setup involves generating migrations to add vector columns and configuring models with
has_neighbors
.
- Official documentation and examples are available for detailed setup and usage: https://github.com/ankane/neighbor
Highlighted Details
- Supports multiple distance metrics: euclidean, cosine, taxicab, chebyshev, inner_product, hamming, jaccard.
- Offers advanced features like half-precision vectors, binary vectors, sparse vectors, and indexing options (HNSW, IVFFlat).
- Integrates with external embedding models (OpenAI, Cohere, Informers, Transformers.rb) for generating and searching embeddings.
- Provides examples for hybrid search, recommendations, and sparse search.
Maintenance & Community
- Developed by ankane, a known contributor in the Ruby data science and ML space.
- The repository is active, with recent commits and issues.
- Contribution guidelines and development setup instructions are provided.
Licensing & Compatibility
- Released under the MIT License, permitting commercial use and integration with closed-source applications.
Limitations & Caveats
- Experimental support for MariaDB and MySQL may have limitations or require specific versions/configurations.
- Performance heavily relies on the underlying database's vector capabilities and indexing strategies.
- Some advanced features or specific distance metrics might be tied to particular database extensions.