Discover and explore top open-source AI tools and projects—updated daily.
appleImage-text models research paper, CVPR 2024
Top 31.0% on SourcePulse
This repository provides the official implementation for MobileCLIP, a research project focused on developing fast and efficient image-text models for mobile devices. It targets researchers and developers seeking high-performance multimodal models with reduced latency and size, offering significant speedups and smaller footprints compared to established models like OpenAI's ViT-B/16 and SigLIP.
How It Works
MobileCLIP employs a multi-modal reinforced training approach, optimizing for both zero-shot performance and inference speed on resource-constrained devices. The models are trained on DataCompDR datasets, and the implementation leverages efficient architectures, including MobileOne, to achieve its performance gains.
Quick Start & Requirements
conda create -n clipenv python=3.10, conda activate clipenv, pip install -e .source get_pretrained_models.sh.pytorch-image-models. CUDA is recommended for GPU acceleration.Highlighted Details
Maintenance & Community
The project is from Apple and was presented at CVPR 2024. Further details on community or ongoing maintenance are not explicitly stated in the README.
Licensing & Compatibility
The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects. Specific license details are not explicitly mentioned but are implied by the permissive nature of Apple's open-source contributions.
Limitations & Caveats
The README focuses on performance highlights and does not detail known limitations, unsupported features, or potential breaking changes. The project is presented as a research implementation, and stability for production use may require further evaluation.
3 weeks ago
Inactive
facebookresearch
salesforce
mlfoundations
google-research
openai