CaFo by OpenGVLab

Few-shot learner research paper using cascaded foundation models

Created 2 years ago

379 stars

Top 75.3% on SourcePulse

Project Summary

CaFo addresses the challenge of few-shot image classification by cascading multiple foundation models. It targets researchers and practitioners seeking to improve classification performance with limited labeled data, leveraging diverse pre-trained models for state-of-the-art results.

How It Works

CaFo employs a "Prompt, Generate, then Cache" strategy. It uses GPT-3 to prompt CLIP for richer linguistic understanding, DALL-E to generate synthetic data for augmentation, and a learnable cache model to adaptively combine predictions from CLIP and DINO. This approach aims to unify and maximize the potential of various pre-training paradigms for enhanced few-shot learning.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create -n cafo python=3.7), activate it (conda activate cafo), and install dependencies (pip install -r requirements.txt). PyTorch and torchvision with matching CUDA versions are required (conda install pytorch torchvision cudatoolkit).
Data: Download datasets and pre-trained weights for CLIP (automatic), DINO (provided link), and DALL-E/Stable Diffusion generated images (provided links). Organize data as specified in DATASET.md.
Configs: Modify .yaml files in configs/ for dataset-specific settings and hyperparameters.
Run: CUDA_VISIBLE_DEVICES=0 python main_imagenet.py --config configs/imagenet/16shot.yaml (or main.py for other datasets).

Highlighted Details

Achieves state-of-the-art few-shot classification performance across 11 datasets.
Integrates CLIP, DINO, DALL-E, and GPT-3 for comprehensive knowledge utilization.
Employs a learnable cache mechanism for adaptive prediction blending.
Supports data augmentation via synthetic image generation.

Maintenance & Community

Developed by OpenGVLab, with contributors including Renrui Zhang, Xiangfei Hu, and Bohao Li.
Contact: zhangrenrui@pjlab.org.cn, sjtuhxf@sjtu.edu.cn.
Related work: Tip-Adapter (ECCV 2022), Point-NN (CVPR 2023).

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it benefits from and is based on Tip-Adapter, CLIP, DINO, DALL-E, and CuPL, which have their own licenses. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

Requires specific versions of PyTorch and CUDA.
Downloading and organizing pre-trained models and datasets is a manual step.
The README does not explicitly state the project's license, which may impact commercial adoption.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

Parameter-Efficient-Transfer-Learning-Benchmark by synbol

Visual PEFT benchmark for computer vision tasks

Created 1 year ago

Updated 1 year ago

visual_token_matching by GitGyun

Research paper for universal few-shot learning of dense prediction tasks

Created 3 years ago

Updated 2 years ago

Starred by

Chenlin Meng

Chenlin Meng(Cofounder of Pika),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

1 more.

image-gpt by teddykoker

PyTorch implementation of Image GPT research paper

Created 5 years ago

Updated 2 years ago

Starred by

Jesse Clark

Jesse Clark(Cofounder of Marqo) and

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

RADIO by NVlabs

Vision foundation model for distilling large models

Created 2 years ago

Updated 1 week ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

few-shot-learning by tonyzhaozh

Few-shot learning codebase for language models, similar to GPT-3

Created 4 years ago

Updated 2 years ago

Point-MAE by Pang-Yatian

Research paper implementation for point cloud self-supervised learning via masked autoencoders

Created 3 years ago

Updated 9 months ago

Meta-DETR by ZhangGongjie

PyTorch implementation for few-shot object detection research

Created 4 years ago

Updated 3 years ago

yolov4-tiny-pytorch by bubbliiiing

PyTorch code for YOLOv4-tiny object detection

Created 5 years ago

Updated 2 years ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

6 more.

Otter by EvolvingLMMs-Lab

Multimodal model for improved instruction following and in-context learning

Created 2 years ago

Updated 1 year ago

practical-ml-vision-book by GoogleCloudPlatform

Code for computer vision book

Created 5 years ago

Updated 1 year ago

Starred by

Deshraj Yadav

Deshraj Yadav(Cofounder of Mem0).

have-fun-with-machine-learning by humphd

Beginner's guide for image classification using neural networks

Created 9 years ago

Updated 4 years ago

Starred by

Eric Zhang

Eric Zhang(Founding Engineer at Modal),

John Resig

John Resig(Author of jQuery; Chief Software Architect at Khan Academy), and

17 more.

open_clip by mlfoundations

OpenCLIP: open-source CLIP implementation for vision-language representation learning

Created 4 years ago

Updated 2 months ago

Feedback? Help us improve.