ml-mobileclip  by apple

Image-text models research paper, CVPR 2024

Created 1 year ago
1,218 stars

Top 32.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for MobileCLIP, a research project focused on developing fast and efficient image-text models for mobile devices. It targets researchers and developers seeking high-performance multimodal models with reduced latency and size, offering significant speedups and smaller footprints compared to established models like OpenAI's ViT-B/16 and SigLIP.

How It Works

MobileCLIP employs a multi-modal reinforced training approach, optimizing for both zero-shot performance and inference speed on resource-constrained devices. The models are trained on DataCompDR datasets, and the implementation leverages efficient architectures, including MobileOne, to achieve its performance gains.

Quick Start & Requirements

  • Install: conda create -n clipenv python=3.10, conda activate clipenv, pip install -e .
  • Pretrained Models: Download via source get_pretrained_models.sh.
  • Dependencies: Python 3.10, PyTorch, Hugging Face pytorch-image-models. CUDA is recommended for GPU acceleration.
  • Resources: Pretrained checkpoints are available on HuggingFace. An iOS app is also provided for demonstration.
  • Docs: https://github.com/apple/ml-mobileclip

Highlighted Details

  • MobileCLIP-S0 achieves 4.8x speedup and 2.8x smaller size than OpenAI's ViT-B/16 with similar zero-shot performance.
  • MobileCLIP-S2 outperforms SigLIP's ViT-B/16 by 2.3x speedup and 2.1x smaller size, trained with 3x fewer samples.
  • MobileCLIP-B (LT) reaches 77.2% ImageNet zero-shot accuracy, surpassing DFN and SigLIP.
  • Native support and integration with the OpenCLIP framework are available.

Maintenance & Community

The project is from Apple and was presented at CVPR 2024. Further details on community or ongoing maintenance are not explicitly stated in the README.

Licensing & Compatibility

The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects. Specific license details are not explicitly mentioned but are implied by the permissive nature of Apple's open-source contributions.

Limitations & Caveats

The README focuses on performance highlights and does not detail known limitations, unsupported features, or potential breaking changes. The project is presented as a research implementation, and stability for production use may require further evaluation.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
192 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Coauthor of Django), and
10 more.

LAVIS by salesforce

0.2%
11k
Library for language-vision AI research
Created 3 years ago
Updated 10 months ago
Feedback? Help us improve.