Discover and explore top open-source AI tools and projects—updated daily.
Calculate CLIP text-image similarity scores
Top 94.6% on SourcePulse
This repository provides efficient batch-wise calculation of CLIP scores, measuring text-image similarity using pre-trained CLIP models. It's designed for researchers and developers working with generative models or evaluating image-text alignment, offering a convenient way to quantify how well an image matches a textual description.
How It Works
The project leverages Hugging Face's transformers
library to load CLIP models, specifically defaulting to openai/clip-vit-base-patch32
. It calculates cosine similarity between image embeddings and text embeddings. The recent update removes the previous 100x scaling factor, outputting direct cosine similarity values, ensuring scores are less than 1. This approach simplifies interpretation and integrates seamlessly with the Hugging Face ecosystem.
Quick Start & Requirements
pip install clip-score
--device cuda:N
or --device cpu
.python -m clip_score path/to/images path/to/text
or python -m clip_score path/to/images "your prompt here"
.Highlighted Details
--real_flag
and --fake_flag
.Maintenance & Community
The project was last updated in April 2025 with significant improvements. The primary contributor is Taited/SUN Zhengwentai. Further details on community channels or roadmaps are not explicitly provided in the README.
Licensing & Compatibility
Licensed under the Apache License 2.0. This permissive license allows for commercial use and integration into closed-source projects.
Limitations & Caveats
The project plans to add support for additional vision-language models like DINO and BLIP, indicating these are not yet implemented. The data input structure requires strict adherence to filename matching between image and text directories.
5 months ago
Inactive