visual_token_matching by GitGyun

Research paper for universal few-shot learning of dense prediction tasks

Created 3 years ago

254 stars

Top 99.1% on SourcePulse

Project Summary

This repository provides the official code for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching, a method that achieved an Outstanding Paper Award at ICLR 2023. It enables efficient few-shot learning for various dense prediction tasks by leveraging a visual token matching approach, benefiting researchers and practitioners in computer vision.

How It Works

The core approach utilizes a "Visual Token Matching" strategy, which is a form of meta-learning. It trains a model to generalize across diverse dense prediction tasks with limited examples. The system likely employs a transformer-based architecture, pre-trained on a large dataset, and then meta-trained on a variety of tasks to learn transferable representations. This allows for rapid adaptation to new, unseen tasks with minimal data.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites:
- Taskonomy Dataset (tiny split)
- BEiT pre-trained checkpoints (e.g., beit_base_patch16_224_pt22k)
- Python environment
Setup: Requires downloading datasets and pre-trained models, then configuring data_paths.yaml.
Links: Taskonomy Dataset, BEiT

Highlighted Details

Achieved Outstanding Paper Award at ICLR 2023.
Supports multiple dense prediction tasks including semantic segmentation, depth estimation, and edge detection.
Leverages pre-trained models like BEiT for strong feature extraction.

Maintenance & Community

The project is associated with author Donggyun Kim.
Code is official and released following the ICLR 2023 publication.

Licensing & Compatibility

The repository does not explicitly state a license. However, it references and builds upon other projects, implying potential licensing considerations from those dependencies.

Limitations & Caveats

The setup requires significant data preparation and downloading of large pre-trained models. The specific license for the code itself is not clearly defined in the README, which may pose compatibility issues for commercial use.

visual_token_matching by GitGyun

Explore Similar Projects

Parameter-Efficient-Transfer-Learning-Benchmark by synbol

Hybrid-VLA by PKU-HMI-Lab

lynx-llm by bytedance

CaFo by OpenGVLab

X-VLM by zengyan-97

LVM by ytongbai

Rex-Omni by IDEA-Research

Point-MAE by Pang-Yatian

BiomedGPT by taokz

Megatron-DeepSpeed by bigscience-workshop

open_flamingo by mlfoundations

FlagAI by FlagAI-Open