Discover and explore top open-source AI tools and projects—updated daily.
Fine-tune CLIP models for improved performance and resilience
Top 99.4% on SourcePulse
This repository provides code for fine-tuning CLIP models, with a particular focus on improving resilience to typographic attacks and addressing the "modality gap." It's targeted at researchers and developers working with generative AI, particularly those using CLIP as a text encoder for diffusion models like Stable Diffusion. The primary benefit is enhanced robustness and potentially better generalization in text-image contrastive learning tasks.
How It Works
The project explores various fine-tuning techniques, including Geometric Parameterization (GmP), which decomposes weights into radial and angular components to operate in hyperspherical coordinates, potentially improving stability and convergence. It also incorporates custom loss functions, such as one with an entropy penalty to mitigate overfitting and label smoothing for better generalization, especially on smaller or noisy datasets. The code allows for flexible model saving, including conversion to formats compatible with Hugging Face Transformers and Stable Diffusion ComfyUI.
Quick Start & Requirements
requirements-finetune.txt
.Highlighted Details
Maintenance & Community
The repository is maintained by "zer0int." While specific community links (Discord/Slack) are not explicitly mentioned, the author's HuggingFace profile serves as a central point for model releases and updates.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided text. However, the focus on compatibility with Stable Diffusion and ComfyUI suggests an intent for integration within broader generative AI workflows. Users should verify licensing for specific models and code components.
Limitations & Caveats
Some experimental features, like GmP, are presented "as-is" with the disclaimer that the author cannot perform extensive benchmarking or ablation studies. The effectiveness of certain techniques may be dataset-dependent, and users might encounter large or 'inf' gradients during initial training epochs, especially with GmP.
2 months ago
1 day