clip-retrieval by rom1504

CLIP retrieval system for semantic search

Created 4 years ago

2,714 stars

Top 17.3% on SourcePulse

View on GitHub

11 Experts Love This Project

John Resig

Author of jQuery; Chief Software Architect at Khan Academy

Chenlin Meng

Cofounder of Pika

Yoland Yan

Cofounder of Comfy Org

Simon Willison

Coauthor of Django

and 7 more!

Project Summary

This project provides a comprehensive toolkit for building scalable CLIP-based retrieval systems, enabling users to compute embeddings, create efficient indices, and serve them via a web API. It's designed for researchers and developers working with large-scale multimodal datasets who need to implement semantic search capabilities.

How It Works

The system leverages CLIP models to generate embeddings for text and images. It then uses autofaiss for efficient Approximate Nearest Neighbor (ANN) indexing, allowing for fast retrieval over millions or billions of items. A Flask-based backend (clip-back) serves these indices, offering a REST API for querying, with optional features like HDF5/Arrow caching for metadata and memory-mapped indices to reduce RAM usage.

Quick Start & Requirements

Install via pip: pip install clip-retrieval
Requires Python 3.7+ and PyTorch. GPU with CUDA is recommended for performance.
Official documentation and examples are available on the GitHub repository.

Highlighted Details

Processes 100M text/image embeddings in 20 hours on a 3080 GPU.
Achieves 1500 samples/sec for CLIP inference on a 3080.
Supports various CLIP models, including OpenCLIP and Hugging Face variants.
Offers optional DeepSparse backend for CPU-accelerated inference.

Maintenance & Community

The project is actively maintained by Romain Beaumont.
Related projects include img2dataset, open_clip, and CLIP_benchmark.
Community discussion can be found via the DataToML chat.

Licensing & Compatibility

The project is released under the MIT License.
Permissive licensing allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project's performance heavily relies on hardware, particularly GPUs for inference and sufficient RAM for indexing. While it scales to billions of samples, managing such large datasets requires careful configuration and potentially distributed computing setups. The clip-front UI is basic and may require customization for production use.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days