visual_token_matching  by GitGyun

Research paper for universal few-shot learning of dense prediction tasks

Created 2 years ago
253 stars

Top 99.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official code for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching, a method that achieved an Outstanding Paper Award at ICLR 2023. It enables efficient few-shot learning for various dense prediction tasks by leveraging a visual token matching approach, benefiting researchers and practitioners in computer vision.

How It Works

The core approach utilizes a "Visual Token Matching" strategy, which is a form of meta-learning. It trains a model to generalize across diverse dense prediction tasks with limited examples. The system likely employs a transformer-based architecture, pre-trained on a large dataset, and then meta-trained on a variety of tasks to learn transferable representations. This allows for rapid adaptation to new, unseen tasks with minimal data.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites:
    • Taskonomy Dataset (tiny split)
    • BEiT pre-trained checkpoints (e.g., beit_base_patch16_224_pt22k)
    • Python environment
  • Setup: Requires downloading datasets and pre-trained models, then configuring data_paths.yaml.
  • Links: Taskonomy Dataset, BEiT

Highlighted Details

  • Achieved Outstanding Paper Award at ICLR 2023.
  • Supports multiple dense prediction tasks including semantic segmentation, depth estimation, and edge detection.
  • Leverages pre-trained models like BEiT for strong feature extraction.

Maintenance & Community

  • The project is associated with author Donggyun Kim.
  • Code is official and released following the ICLR 2023 publication.

Licensing & Compatibility

  • The repository does not explicitly state a license. However, it references and builds upon other projects, implying potential licensing considerations from those dependencies.

Limitations & Caveats

The setup requires significant data preparation and downloading of large pre-trained models. The specific license for the code itself is not clearly defined in the README, which may pose compatibility issues for commercial use.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.