CLIP-ImageSearch-NCNN  by EdVince

Image search demo using natural language queries

Created 3 years ago
265 stars

Top 96.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a natural language-based image search engine using CLIP and NCNN, targeting mobile (Android) and desktop (x86) platforms. It enables users to find images within a gallery by describing them in text, offering a seamless search experience akin to built-in phone gallery features.

How It Works

The system leverages CLIP's encode_image to extract features from gallery images, creating a feature vector database. For a given text query, CLIP's encode_text generates a text feature vector. Similarity is then calculated between these vectors, allowing for text-to-image matching. The project currently displays the single highest probability match for simplicity but can be extended to return multiple relevant images.

Quick Start & Requirements

  • Android: Download the provided APK. Scan the gallery, extract features (0.5s/image on Kirin 970), input English text query, and search (1.5s on Kirin 970).
  • x86: Download the provided EXE. Select a gallery folder, extract features, input English text query, and search.
  • Models: Pre-compiled .bin files for NCNN are available for download from the project's releases page and must be placed in the assert folder of the respective demo projects.
  • Dependencies: NCNN, CLIP (ResNet50 variant "RN50" used for feature extraction).

Highlighted Details

  • Enables natural language image search ("search by text").
  • Supports bidirectional matching (text-to-image, image-to-text, image-to-image).
  • Offers pre-compiled executables for Android and x86 platforms.
  • Feature extraction is the most time-consuming step.

Maintenance & Community

The project notes that work is infrequent due to other commitments, with a focus on gaining stars. No community links or further contributor information are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project utilizes NCNN and CLIP, which have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as having slow updates and is primarily seeking stars. It currently only displays the top match, not a ranked list of results. The README does not specify the exact CLIP model version or licensing details.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

gill by kohjingyu

0%
463
Multimodal LLM for generating/retrieving images and generating text
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Coauthor of Django), and
10 more.

LAVIS by salesforce

0.2%
11k
Library for language-vision AI research
Created 3 years ago
Updated 10 months ago
Feedback? Help us improve.