vectordb  by jina-ai

Python vector database for semantic similarity search

created 2 years ago
628 stars

Top 53.5% on sourcepulse

GitHubView on GitHub
Project Summary

A Pythonic vector database designed for simplicity and scalability, offering core CRUD operations and flexible deployment options from local to cloud environments. It targets developers needing a lean yet powerful solution for managing and searching vector embeddings, leveraging DocArray for search logic and Jina for scalable index serving.

How It Works

vectordb utilizes DocArray as its core engine for vector search logic, enabling efficient Approximate Nearest Neighbor (ANN) and Exact Nearest Neighbor (ENN) searches. Jina provides the underlying infrastructure for scalable index serving, supporting sharding and replication for high availability and throughput. This architecture allows vectordb to function as a standalone library or be served as a scalable service via gRPC, HTTP, or WebSockets.

Quick Start & Requirements

  • Install: pip install vectordb
  • Prerequisites: Python 3.x, NumPy. HNSWVectorDB requires HNSWLib.
  • Setup: Local setup involves defining a BaseDoc schema with DocArray and initializing a database class (e.g., InMemoryExactNNVectorDB, HNSWVectorDB).
  • Docs: https://docs.jina.ai/concepts/vectordb/

Highlighted Details

  • Offers both Exact NN (InMemoryExactNNVectorDB) and Approximate NN (HNSWVectorDB) search capabilities.
  • Supports serving as a service via gRPC, HTTP, and WebSocket protocols.
  • Provides sharding and replication for scalability and availability.
  • Integrates with Jina AI Cloud for seamless cloud deployment.

Maintenance & Community

Licensing & Compatibility

  • License: Apache-2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

Currently, Jina AI Cloud deployments are limited to 1 replica; support for N replicas in the cloud is under development. The roadmap indicates plans for more ANN algorithms and enhanced filtering capabilities.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

NeumAI by NeumTry

0%
858
Data platform for retrieval-augmented generation (RAG)
created 1 year ago
updated 1 year ago
Feedback? Help us improve.