Swift package for on-device text embeddings and semantic search
Top 64.9% on sourcepulse
This Swift package enables on-device text embeddings and semantic search for iOS and macOS applications, prioritizing speed, extensibility, and privacy. It allows developers to build applications like privacy-focused search engines, offline Q&A systems, and document clustering tools without relying on cloud services, keeping sensitive data local.
How It Works
The library utilizes a SimilarityIndex
class that accepts pluggable embedding models and distance metrics. Developers can choose from built-in models like Apple's NaturalLanguage
or HuggingFace models (MiniLMAll
, Distilbert
, MiniLMMultiQA
), or bring their own by conforming to the EmbeddingsProtocol
. Similarity is calculated using metrics like Cosine Similarity or Euclidean Distance. The architecture is designed for extensibility, allowing custom implementations for text splitting, tokenization, and vector storage.
Quick Start & Requirements
https://github.com/ZachNagengast/similarity-search-kit.git
to your Package.swift
.SimilaritySearchKitDistilbert
) if not using the built-in NaturalLanguage
model.Highlighted Details
Maintenance & Community
The project is maintained by Zach Nagengast. Contact information for feedback and feature requests is provided via Twitter and email. Future work includes performance improvements, HSNW/Annoy indexing, query filters, and Metal acceleration.
Licensing & Compatibility
The project appears to be licensed under the MIT License, allowing for commercial use and integration into closed-source applications.
Limitations & Caveats
The library is currently in early development (version 0.0.1). Features like disk-backed indexing for large datasets, query filters, sparse/dense hybrid search, and summarization models are planned for future releases. Metal acceleration for distance calculations is also a future consideration.
1 year ago
1 day