similarity-search-kit  by ZachNagengast

Swift package for on-device text embeddings and semantic search

created 2 years ago
477 stars

Top 64.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This Swift package enables on-device text embeddings and semantic search for iOS and macOS applications, prioritizing speed, extensibility, and privacy. It allows developers to build applications like privacy-focused search engines, offline Q&A systems, and document clustering tools without relying on cloud services, keeping sensitive data local.

How It Works

The library utilizes a SimilarityIndex class that accepts pluggable embedding models and distance metrics. Developers can choose from built-in models like Apple's NaturalLanguage or HuggingFace models (MiniLMAll, Distilbert, MiniLMMultiQA), or bring their own by conforming to the EmbeddingsProtocol. Similarity is calculated using metrics like Cosine Similarity or Euclidean Distance. The architecture is designed for extensibility, allowing custom implementations for text splitting, tokenization, and vector storage.

Quick Start & Requirements

  • Install via Swift Package Manager in Xcode (File → Add Packages...) or by adding the URL https://github.com/ZachNagengast/similarity-search-kit.git to your Package.swift.
  • Dependencies include specific model packages (e.g., SimilaritySearchKitDistilbert) if not using the built-in NaturalLanguage model.
  • Requires iOS 16.0+ or macOS 13.0+ for example projects.
  • Official documentation and examples are available in the repository.

Highlighted Details

  • Supports on-device text embeddings and semantic search for Apple platforms.
  • Offers a variety of pre-trained NLP models from HuggingFace and Apple.
  • Allows custom embedding models, distance metrics, text splitters, tokenizers, and vector stores.
  • Examples include PDF search and indexing all text files on a computer.

Maintenance & Community

The project is maintained by Zach Nagengast. Contact information for feedback and feature requests is provided via Twitter and email. Future work includes performance improvements, HSNW/Annoy indexing, query filters, and Metal acceleration.

Licensing & Compatibility

The project appears to be licensed under the MIT License, allowing for commercial use and integration into closed-source applications.

Limitations & Caveats

The library is currently in early development (version 0.0.1). Features like disk-backed indexing for large datasets, query filters, sparse/dense hybrid search, and summarization models are planned for future releases. Metal acceleration for distance calculations is also a future consideration.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Simon Willison Simon Willison(Author of Django).

semantra by freedmand

0.0%
3k
CLI tool for semantic document search
created 2 years ago
updated 11 months ago
Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Mike McNeil Mike McNeil(Author of Sails.js; Cofounder of Fleet), and
10 more.

meilisearch by meilisearch

0.2%
53k
Search engine API for integrating AI-powered hybrid search
created 7 years ago
updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 3 days ago
Feedback? Help us improve.