CLIP-fine-tune by zer0int

Fine-tune CLIP models for improved performance and resilience

Created 1 year ago

262 stars

Top 97.2% on SourcePulse

Project Summary

This repository provides code for fine-tuning CLIP models, with a particular focus on improving resilience to typographic attacks and addressing the "modality gap." It's targeted at researchers and developers working with generative AI, particularly those using CLIP as a text encoder for diffusion models like Stable Diffusion. The primary benefit is enhanced robustness and potentially better generalization in text-image contrastive learning tasks.

How It Works

The project explores various fine-tuning techniques, including Geometric Parameterization (GmP), which decomposes weights into radial and angular components to operate in hyperspherical coordinates, potentially improving stability and convergence. It also incorporates custom loss functions, such as one with an entropy penalty to mitigate overfitting and label smoothing for better generalization, especially on smaller or noisy datasets. The code allows for flexible model saving, including conversion to formats compatible with Hugging Face Transformers and Stable Diffusion ComfyUI.

Quick Start & Requirements

Installation: Install dependencies from requirements-finetune.txt.
Prerequisites: Python, PyTorch. Specific model fine-tuning may require datasets like ImageNet, ObjectNet, COCO, or custom datasets. GPU with sufficient VRAM (e.g., 24GB for ViT-L/14) is recommended.
Setup Time: Fine-tuning 10,000 text-image pairs can take 1-2 hours on an RTX 4090.
Links:
- HuggingFace models: https://huggingface.co/zer0int
- Long-CLIP: https://github.com/beichenzbc/Long-CLIP
- ComfyUI: https://comfyanonymous.github.io/ComfyUI_manual/

Highlighted Details

CLIP-KO variants offer improved typographic attack resilience, with cleaner attention heatmaps.
GmP fine-tuning has shown to achieve higher ImageNet/ObjectNet accuracy (~0.90) compared to original CLIP (~0.85).
Includes scripts for converting fine-tuned models for use with Stable Diffusion (SDXL) and ComfyUI.
Experimental features like neuron manipulation and custom loss functions are available for advanced users.

Maintenance & Community

The repository is maintained by "zer0int." While specific community links (Discord/Slack) are not explicitly mentioned, the author's HuggingFace profile serves as a central point for model releases and updates.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided text. However, the focus on compatibility with Stable Diffusion and ComfyUI suggests an intent for integration within broader generative AI workflows. Users should verify licensing for specific models and code components.

Limitations & Caveats

Some experimental features, like GmP, are presented "as-is" with the disclaimer that the author cannot perform extensive benchmarking or ablation studies. The effectiveness of certain techniques may be dataset-dependent, and users might encounter large or 'inf' gradients during initial training epochs, especially with GmP.

Health Check

Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days