train-CLIP  by Zasder3

PyTorch Lightning module for CLIP model training and fine-tuning

created 4 years ago
706 stars

Top 49.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch Lightning implementation for training OpenAI's CLIP model from scratch or fine-tuning it. It's designed for researchers and practitioners looking to replicate or adapt CLIP's capabilities for visual-language understanding tasks. The solution aims for ease of use and fidelity to the original CLIP paper.

How It Works

The project leverages PyTorch Lightning for a structured training pipeline. It supports training CLIP from scratch using specified model architectures (e.g., ResNet50, ViT-B/32) and a provided dataset directory. For data-efficient fine-tuning, it offers a CustomCLIPWrapper that allows integrating pre-trained image encoders and Hugging Face text encoders, enabling faster adaptation with less data. The data loading mechanism expects image-caption pairs with matching stems and captions separated by newlines.

Quick Start & Requirements

  • Install/Run: python train.py --model_name <model_name> --folder <data_dir> --batchsize <batch_size>
  • Prerequisites: PyTorch, PyTorch Lightning, Hugging Face Transformers. Specific model architectures may require corresponding pre-trained weights.
  • Setup: Assumes a data directory with image-caption pairs.

Highlighted Details

  • Supports training from scratch with various OpenAI CLIP model architectures (RN50, RN50x4, RN101, ViT-B/32).
  • Enables data-efficient fine-tuning by wrapping pre-trained image and text encoders.
  • Flexible data loading with support for custom DataLoaders and automatic pairing of images and text files.
  • Aims for fidelity to the original CLIP paper's implementation.

Maintenance & Community

  • The repository is maintained by Zasder3.
  • No specific community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is a direct implementation of the CLIP training script and may not include all the latest optimizations or advanced features like gradient checkpointing or half-precision Adam statistics, which are listed as future work. The lack of an explicit license could pose a barrier for commercial adoption.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.