CLIP_benchmark by LAION-AI

CLI tool for CLIP-like model evaluation across diverse tasks/datasets

Created 3 years ago

796 stars

Top 44.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Jesse Clark

Cofounder of Marqo

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Ross Wightman

Author of timm; CV at Hugging Face

Project Summary

This repository provides a standardized framework for evaluating CLIP-like models across various tasks, including zero-shot classification, retrieval, captioning, and linear probing. It is designed for researchers and developers working with multimodal vision-language models, offering a comprehensive suite of datasets and evaluation metrics to compare model performance.

How It Works

The benchmark utilizes a command-line interface (CLI) to orchestrate evaluations. Users specify models, datasets, and tasks, with the framework handling data loading, model inference, and metric calculation. It supports diverse data sources like torchvision, TensorFlow Datasets, and custom WebDataset formats, enabling flexible benchmarking. The architecture is modular, allowing for the integration of new models and datasets by defining appropriate loading functions.

Quick Start & Requirements

Install via pip: pip install clip-benchmark
Requires Python. Specific version not stated.
For certain datasets (e.g., COCO captions), additional libraries like pycocotools and pycocoevalcap may be needed.
Official documentation and Huggingface datasets are linked within the README.

Highlighted Details

Supports zero-shot classification, retrieval, captioning, and linear probing.
Integrates with OpenCLIP, Japanese CLIP, and NLLB CLIP models.
Evaluates on datasets from torchvision, TensorFlow Datasets, and VTAB.
Includes support for multilingual datasets and compositionality tasks.

Maintenance & Community

The project acknowledges contributions from various individuals and projects, including OpenCLIP, SLIP, Wise-ft, LiT, Sugar Crepe, Babel ImageNet, and others. Links to community resources like Discord/Slack are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README does not specify hardware requirements (e.g., GPU, CUDA versions) or provide explicit setup time estimates. The exact Python version compatibility is also not detailed.

Health Check

Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days