ml-mobileclip  by apple

Image-text models research paper, CVPR 2024

created 1 year ago
1,009 stars

Top 37.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for MobileCLIP, a research project focused on developing fast and efficient image-text models for mobile devices. It targets researchers and developers seeking high-performance multimodal models with reduced latency and size, offering significant speedups and smaller footprints compared to established models like OpenAI's ViT-B/16 and SigLIP.

How It Works

MobileCLIP employs a multi-modal reinforced training approach, optimizing for both zero-shot performance and inference speed on resource-constrained devices. The models are trained on DataCompDR datasets, and the implementation leverages efficient architectures, including MobileOne, to achieve its performance gains.

Quick Start & Requirements

  • Install: conda create -n clipenv python=3.10, conda activate clipenv, pip install -e .
  • Pretrained Models: Download via source get_pretrained_models.sh.
  • Dependencies: Python 3.10, PyTorch, Hugging Face pytorch-image-models. CUDA is recommended for GPU acceleration.
  • Resources: Pretrained checkpoints are available on HuggingFace. An iOS app is also provided for demonstration.
  • Docs: https://github.com/apple/ml-mobileclip

Highlighted Details

  • MobileCLIP-S0 achieves 4.8x speedup and 2.8x smaller size than OpenAI's ViT-B/16 with similar zero-shot performance.
  • MobileCLIP-S2 outperforms SigLIP's ViT-B/16 by 2.3x speedup and 2.1x smaller size, trained with 3x fewer samples.
  • MobileCLIP-B (LT) reaches 77.2% ImageNet zero-shot accuracy, surpassing DFN and SigLIP.
  • Native support and integration with the OpenCLIP framework are available.

Maintenance & Community

The project is from Apple and was presented at CVPR 2024. Further details on community or ongoing maintenance are not explicitly stated in the README.

Licensing & Compatibility

The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects. Specific license details are not explicitly mentioned but are implied by the permissive nature of Apple's open-source contributions.

Limitations & Caveats

The README focuses on performance highlights and does not detail known limitations, unsupported features, or potential breaking changes. The project is presented as a research implementation, and stability for production use may require further evaluation.

Health Check
Last commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
100 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.