wise-ft  by mlfoundations

Research paper code for robust fine-tuning of zero-shot models

created 3 years ago
725 stars

Top 48.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides WiSE-FT, a method for robustly fine-tuning large zero-shot models like CLIP. It addresses the common issue where standard fine-tuning degrades out-of-distribution (OOD) accuracy. WiSE-FT is designed for researchers and practitioners working with vision-language models who need to adapt them to specific tasks while maintaining or improving robustness across different data distributions.

How It Works

WiSE-FT achieves robustness by ensembling the weights of the original zero-shot model and a standard fine-tuned model. This interpolation is performed using a mixing coefficient alpha, effectively creating a convex combination of the two weight sets. This approach preserves the generalization capabilities of the zero-shot model while incorporating task-specific knowledge from fine-tuning, leading to improved OOD performance without additional computational cost during inference or fine-tuning.

Quick Start & Requirements

  • Install: conda env create -f environment.yml and conda activate wiseft. Add directory to PYTHONPATH: export PYTHONPATH="$PYTHONPATH:$PWD".
  • Prerequisites: Python, Conda, PyTorch. Specific dataset downloads are detailed in datasets.md.
  • Running WiSE-FT:
    • From existing checkpoints: python src/wise_ft.py --load=models/zeroshot.pt,models/finetuned.pt --eval-datasets=... --alpha 0 0.1 ...
    • From scratch (e.g., ViT-B/32): python src/wise_ft.py --train-dataset=ImageNet --model=ViT-B/32 --eval-datasets=... --alpha 0 0.1 ...
  • Plotting: python src/scatter_plot.py --results-db=results.jsonl --save plots
  • Documentation: https://arxiv.org/abs/2109.01903

Highlighted Details

  • Improves OOD accuracy by 4-6 pp over prior work on ImageNet distribution shifts.
  • Achieves 2-23 pp robustness improvements on diverse distribution shifts.
  • Preserves or improves in-distribution accuracy.
  • No additional computational cost during fine-tuning or inference.

Maintenance & Community

The project is associated with the paper "Robust fine-tuning of zero-shot models" by multiple authors from institutions including the Allen Institute for AI. No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes, and commercial use would require careful review of any associated licenses or terms of use from the underlying models or datasets.

Limitations & Caveats

The README does not specify any limitations or known bugs. The effectiveness of WiSE-FT may depend on the quality and compatibility of the zero-shot and fine-tuned model checkpoints used. The setup requires downloading specific datasets, which may be substantial.

Health Check
Last commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
32 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
1 more.

fine-tune-mistral by abacaj

0.3%
716
Fine-tuning script for Mistral-7B
created 1 year ago
updated 1 year ago
Feedback? Help us improve.