Embedding database using SQLite and Pytorch
Top 46.1% on sourcepulse
This project provides a lightweight, embeddable vector database designed for small to medium datasets, targeting developers who find traditional vector databases overly complex for common use cases like document search or website product discovery. It aims to offer comparable speed to advanced solutions with a significantly simpler architecture and MIT licensing.
How It Works
Tinyvector utilizes a minimalist architecture comprising a Flask server, an SQLite database for data storage, and NumPy for indexing. It prioritizes in-memory indexing for fast querying, allowing vertical scaling to handle millions of vector dimensions. The project emphasizes ease of customization due to its small codebase.
Quick Start & Requirements
pip install -r requirements
python -m server
pip install pytest pytest-mock
and pytest
Highlighted Details
Maintenance & Community
The project is actively under development, with a stated goal of being production-ready by late July. Contributions are encouraged, with specific ideas for improvement listed, such as adding metadata filtering and GPU acceleration. Contact: @willdepue.
Licensing & Compatibility
MIT Licensed, permitting commercial use and integration with closed-source applications.
Limitations & Caveats
The project is explicitly marked as "in development" and "not ready." Known major bugs include potential data corruption where stored vectors change, possibly due to blob or norm functions. PCA and brute-force indexing are not yet tested. Metadata filtering is not currently supported but is a planned feature.
2 years ago
Inactive