tinyvector  by m1guelpf

Embedding database in pure Rust

created 2 years ago
416 stars

Top 71.5% on sourcepulse

GitHubView on GitHub
Project Summary

Tinyvector is a lightweight, pure Rust embedding database designed for users who find existing vector databases overly complex for their needs, such as document chat applications or small-scale e-commerce search. It offers a simple, customizable, and fast solution for managing and querying vector embeddings.

How It Works

Tinyvector utilizes an in-memory storage approach for its indexes, enabling fast querying on small to medium datasets. It is built as a minimal Axum server, allowing for easy customization with approximately 600 lines of code. The project aims to provide comparable speed to more advanced databases for smaller datasets and slightly better accuracy, with future plans for integrated model support and metadata filtering.

Quick Start & Requirements

  • Docker: docker run -p 8000:8000 ghcr.io/m1guelpf/tinyvector:edge
  • Build from Source: cargo install tinyvector or cargo build --release after cloning.
  • Persistence: Bind a volume to /tinyvector/storage when using Docker Compose or Kubernetes.
  • Dependencies: Rust toolchain for building from source.
  • Docs: https://github.com/m1guelpf/tinyvector

Highlighted Details

  • Written in pure Rust for performance and minimal dependencies.
  • Designed for vertical scaling, supporting 100 million+ vector dimensions in memory.
  • Future features include metadata filtering and integrated embedding models (SBERT, Hugging Face, OpenAI, Cohere).
  • Aims to auto-generate TypeScript/Python clients via an OpenAPI schema.

Maintenance & Community

The project is actively maintained by m1guelpf. Further community and roadmap details are not explicitly provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is still under active development, with features like powerful queries and integrated models planned for future releases. Current performance claims are based on expectations for small to medium datasets.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.