visual_token_matching  by GitGyun

Research paper for universal few-shot learning of dense prediction tasks

created 2 years ago
253 stars

Top 99.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official code for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching, a method that achieved an Outstanding Paper Award at ICLR 2023. It enables efficient few-shot learning for various dense prediction tasks by leveraging a visual token matching approach, benefiting researchers and practitioners in computer vision.

How It Works

The core approach utilizes a "Visual Token Matching" strategy, which is a form of meta-learning. It trains a model to generalize across diverse dense prediction tasks with limited examples. The system likely employs a transformer-based architecture, pre-trained on a large dataset, and then meta-trained on a variety of tasks to learn transferable representations. This allows for rapid adaptation to new, unseen tasks with minimal data.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites:
    • Taskonomy Dataset (tiny split)
    • BEiT pre-trained checkpoints (e.g., beit_base_patch16_224_pt22k)
    • Python environment
  • Setup: Requires downloading datasets and pre-trained models, then configuring data_paths.yaml.
  • Links: Taskonomy Dataset, BEiT

Highlighted Details

  • Achieved Outstanding Paper Award at ICLR 2023.
  • Supports multiple dense prediction tasks including semantic segmentation, depth estimation, and edge detection.
  • Leverages pre-trained models like BEiT for strong feature extraction.

Maintenance & Community

  • The project is associated with author Donggyun Kim.
  • Code is official and released following the ICLR 2023 publication.

Licensing & Compatibility

  • The repository does not explicitly state a license. However, it references and builds upon other projects, implying potential licensing considerations from those dependencies.

Limitations & Caveats

The setup requires significant data preparation and downloading of large pre-trained models. The specific license for the code itself is not clearly defined in the README, which may pose compatibility issues for commercial use.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.