CLIP_benchmark  by LAION-AI

CLI tool for CLIP-like model evaluation across diverse tasks/datasets

Created 3 years ago
767 stars

Top 45.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a standardized framework for evaluating CLIP-like models across various tasks, including zero-shot classification, retrieval, captioning, and linear probing. It is designed for researchers and developers working with multimodal vision-language models, offering a comprehensive suite of datasets and evaluation metrics to compare model performance.

How It Works

The benchmark utilizes a command-line interface (CLI) to orchestrate evaluations. Users specify models, datasets, and tasks, with the framework handling data loading, model inference, and metric calculation. It supports diverse data sources like torchvision, TensorFlow Datasets, and custom WebDataset formats, enabling flexible benchmarking. The architecture is modular, allowing for the integration of new models and datasets by defining appropriate loading functions.

Quick Start & Requirements

  • Install via pip: pip install clip-benchmark
  • Requires Python. Specific version not stated.
  • For certain datasets (e.g., COCO captions), additional libraries like pycocotools and pycocoevalcap may be needed.
  • Official documentation and Huggingface datasets are linked within the README.

Highlighted Details

  • Supports zero-shot classification, retrieval, captioning, and linear probing.
  • Integrates with OpenCLIP, Japanese CLIP, and NLLB CLIP models.
  • Evaluates on datasets from torchvision, TensorFlow Datasets, and VTAB.
  • Includes support for multilingual datasets and compositionality tasks.

Maintenance & Community

The project acknowledges contributions from various individuals and projects, including OpenCLIP, SLIP, Wise-ft, LiT, Sugar Crepe, Babel ImageNet, and others. Links to community resources like Discord/Slack are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README does not specify hardware requirements (e.g., GPU, CUDA versions) or provide explicit setup time estimates. The exact Python version compatibility is also not detailed.

Health Check
Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
1 more.

CLIP_prefix_caption by rmokady

0.1%
1k
Image captioning model using CLIP embeddings as a prefix
Created 4 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Coauthor of Django), and
10 more.

LAVIS by salesforce

0.2%
11k
Library for language-vision AI research
Created 3 years ago
Updated 10 months ago
Feedback? Help us improve.