Image-text models research paper, CVPR 2024
Top 37.7% on sourcepulse
This repository provides the official implementation for MobileCLIP, a research project focused on developing fast and efficient image-text models for mobile devices. It targets researchers and developers seeking high-performance multimodal models with reduced latency and size, offering significant speedups and smaller footprints compared to established models like OpenAI's ViT-B/16 and SigLIP.
How It Works
MobileCLIP employs a multi-modal reinforced training approach, optimizing for both zero-shot performance and inference speed on resource-constrained devices. The models are trained on DataCompDR datasets, and the implementation leverages efficient architectures, including MobileOne, to achieve its performance gains.
Quick Start & Requirements
conda create -n clipenv python=3.10
, conda activate clipenv
, pip install -e .
source get_pretrained_models.sh
.pytorch-image-models
. CUDA is recommended for GPU acceleration.Highlighted Details
Maintenance & Community
The project is from Apple and was presented at CVPR 2024. Further details on community or ongoing maintenance are not explicitly stated in the README.
Licensing & Compatibility
The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects. Specific license details are not explicitly mentioned but are implied by the permissive nature of Apple's open-source contributions.
Limitations & Caveats
The README focuses on performance highlights and does not detail known limitations, unsupported features, or potential breaking changes. The project is presented as a research implementation, and stability for production use may require further evaluation.
8 months ago
Inactive