train-CLIP by Zasder3

PyTorch Lightning module for CLIP model training and fine-tuning

Created 4 years ago

720 stars

Top 47.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jesse Clark

Cofounder of Marqo

Project Summary

This repository provides a PyTorch Lightning implementation for training OpenAI's CLIP model from scratch or fine-tuning it. It's designed for researchers and practitioners looking to replicate or adapt CLIP's capabilities for visual-language understanding tasks. The solution aims for ease of use and fidelity to the original CLIP paper.

How It Works

The project leverages PyTorch Lightning for a structured training pipeline. It supports training CLIP from scratch using specified model architectures (e.g., ResNet50, ViT-B/32) and a provided dataset directory. For data-efficient fine-tuning, it offers a CustomCLIPWrapper that allows integrating pre-trained image encoders and Hugging Face text encoders, enabling faster adaptation with less data. The data loading mechanism expects image-caption pairs with matching stems and captions separated by newlines.

Quick Start & Requirements

Install/Run: python train.py --model_name <model_name> --folder <data_dir> --batchsize <batch_size>
Prerequisites: PyTorch, PyTorch Lightning, Hugging Face Transformers. Specific model architectures may require corresponding pre-trained weights.
Setup: Assumes a data directory with image-caption pairs.

Highlighted Details

Supports training from scratch with various OpenAI CLIP model architectures (RN50, RN50x4, RN101, ViT-B/32).
Enables data-efficient fine-tuning by wrapping pre-trained image and text encoders.
Flexible data loading with support for custom DataLoaders and automatic pairing of images and text files.
Aims for fidelity to the original CLIP paper's implementation.

Maintenance & Community

The repository is maintained by Zasder3.
No specific community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is a direct implementation of the CLIP training script and may not include all the latest optimizations or advanced features like gradient checkpointing or half-precision Adam statistics, which are listed as future work. The lack of an explicit license could pose a barrier for commercial adoption.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days