ml-mobileclip by apple

Image-text models research paper, CVPR 2024

Created 1 year ago

1,373 stars

Top 29.3% on SourcePulse

View on GitHub

4 Experts Love This Project

Eric Zhang

Founding Engineer at Modal

Tim J. Baek

Founder of Open WebUI

Jesse Clark

Cofounder of Marqo

Joseph Nelson

Cofounder of Roboflow

Project Summary

This repository provides the official implementation for MobileCLIP, a research project focused on developing fast and efficient image-text models for mobile devices. It targets researchers and developers seeking high-performance multimodal models with reduced latency and size, offering significant speedups and smaller footprints compared to established models like OpenAI's ViT-B/16 and SigLIP.

How It Works

MobileCLIP employs a multi-modal reinforced training approach, optimizing for both zero-shot performance and inference speed on resource-constrained devices. The models are trained on DataCompDR datasets, and the implementation leverages efficient architectures, including MobileOne, to achieve its performance gains.

Quick Start & Requirements

Install: conda create -n clipenv python=3.10, conda activate clipenv, pip install -e .
Pretrained Models: Download via source get_pretrained_models.sh.
Dependencies: Python 3.10, PyTorch, Hugging Face pytorch-image-models. CUDA is recommended for GPU acceleration.
Resources: Pretrained checkpoints are available on HuggingFace. An iOS app is also provided for demonstration.
Docs: https://github.com/apple/ml-mobileclip

Highlighted Details

MobileCLIP-S0 achieves 4.8x speedup and 2.8x smaller size than OpenAI's ViT-B/16 with similar zero-shot performance.
MobileCLIP-S2 outperforms SigLIP's ViT-B/16 by 2.3x speedup and 2.1x smaller size, trained with 3x fewer samples.
MobileCLIP-B (LT) reaches 77.2% ImageNet zero-shot accuracy, surpassing DFN and SigLIP.
Native support and integration with the OpenCLIP framework are available.

Maintenance & Community

The project is from Apple and was presented at CVPR 2024. Further details on community or ongoing maintenance are not explicitly stated in the README.

Licensing & Compatibility

The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects. Specific license details are not explicitly mentioned but are implied by the permissive nature of Apple's open-source contributions.

Limitations & Caveats

The README focuses on performance highlights and does not detail known limitations, unsupported features, or potential breaking changes. The project is presented as a research implementation, and stability for production use may require further evaluation.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

41 stars in the last 30 days