Discover and explore top open-source AI tools and projects—updated daily.
TutteInstituteFast clustering for embedding vectors
Top 88.3% on SourcePulse
Embedding Vector Oriented Clustering (EVōC) is a Python library designed for rapid and flexible clustering of large, high-dimensional embedding vectors. It targets users working with embeddings from models like CLIP, sentence-transformers, OpenAI, and Cohere, offering significant speed improvements and reduced hyperparameter tuning compared to traditional methods like UMAP + HDBSCAN. The library excels at producing high-quality clusters efficiently, even for quantized vector formats.
How It Works
EVōC specializes in embedding vectors, optimizing its approach to bypass the time-consuming aspects of general-purpose clustering pipelines. It leverages techniques inspired by PLSCAN and density-based methods to achieve fast, CPU-bound clustering. A key innovation is its ability to generate multi-granularity clusters, providing a hierarchy of results from fine-grained to coarse-grained, and it natively supports clustering of int8 or binary quantized embeddings.
Quick Start & Requirements
Installation is straightforward via pip: pip install evoc. Core dependencies include numpy, scikit-learn, numba, tqdm, and tbb. No specialized hardware like GPUs is mentioned. Full documentation is available at https://evoc.readthedocs.io/en/latest/.
Highlighted Details
Maintenance & Community
The project welcomes contributions via pull requests. Specific community channels (e.g., Discord, Slack), roadmap, or notable contributors/sponsorships are not detailed in the README.
Licensing & Compatibility
EVōC is released under the permissive BSD (2-clause) license, allowing for broad compatibility, including commercial use.
Limitations & Caveats
The library is explicitly described as an "early beta version," with a warning that "Things can and will break right now." Users should expect potential instability and are encouraged to provide feedback.
2 weeks ago
1 day
enjalot
dleemiller
Dicklesworthstone