train-CLIP  by Zasder3

PyTorch Lightning module for CLIP model training and fine-tuning

Created 4 years ago
711 stars

Top 48.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch Lightning implementation for training OpenAI's CLIP model from scratch or fine-tuning it. It's designed for researchers and practitioners looking to replicate or adapt CLIP's capabilities for visual-language understanding tasks. The solution aims for ease of use and fidelity to the original CLIP paper.

How It Works

The project leverages PyTorch Lightning for a structured training pipeline. It supports training CLIP from scratch using specified model architectures (e.g., ResNet50, ViT-B/32) and a provided dataset directory. For data-efficient fine-tuning, it offers a CustomCLIPWrapper that allows integrating pre-trained image encoders and Hugging Face text encoders, enabling faster adaptation with less data. The data loading mechanism expects image-caption pairs with matching stems and captions separated by newlines.

Quick Start & Requirements

  • Install/Run: python train.py --model_name <model_name> --folder <data_dir> --batchsize <batch_size>
  • Prerequisites: PyTorch, PyTorch Lightning, Hugging Face Transformers. Specific model architectures may require corresponding pre-trained weights.
  • Setup: Assumes a data directory with image-caption pairs.

Highlighted Details

  • Supports training from scratch with various OpenAI CLIP model architectures (RN50, RN50x4, RN101, ViT-B/32).
  • Enables data-efficient fine-tuning by wrapping pre-trained image and text encoders.
  • Flexible data loading with support for custom DataLoaders and automatic pairing of images and text files.
  • Aims for fidelity to the original CLIP paper's implementation.

Maintenance & Community

  • The repository is maintained by Zasder3.
  • No specific community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is a direct implementation of the CLIP training script and may not include all the latest optimizations or advanced features like gradient checkpointing or half-precision Adam statistics, which are listed as future work. The lack of an explicit license could pose a barrier for commercial adoption.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

Otter by EvolvingLMMs-Lab

0.0%
3k
Multimodal model for improved instruction following and in-context learning
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.