mona by LeiyiHU

Adapter tuning research for visual recognition tasks

created 1 year ago

344 stars

Top 81.6% on sourcepulse

Project Summary

Mona is a novel adapter-based tuning method designed to surpass the performance of full fine-tuning in visual recognition tasks, particularly for challenging areas like instance and semantic segmentation. It targets researchers and practitioners in computer vision seeking efficient and high-performing alternatives to traditional fine-tuning.

How It Works

Mona, or Multi-cognitive Visual Adapter, is a parameter-efficient tuning method that integrates multiple adapter modules. This approach is designed to enhance transfer learning efficiency and performance by offering a competitive alternative to full fine-tuning, aiming to break previous performance ceilings in delta-tuning methods.

Quick Start & Requirements

Installation: Requires setting up environments and preparing datasets as per Swin-Transformer-Object-Detection, Swin-Transformer-Semantic-Segmentation, and Swin-Transformer-Classification repositories.
Prerequisites: PyTorch, CUDA (implied for GPU training), and specific dataset formats (COCO, ADE20K, VOC, Oxford Flower, Oxford Pet). Pre-trained weights (IM22K-Supervised Swin-Base/Large) are required.
Training: Execute dist_train.sh scripts with modified configuration files specifying data_root and load_from paths.
Links: CVPR Homepage, arXiv

Highlighted Details

Outperforms full fine-tuning on COCO object detection (53.4 box AP, 46.0 mask AP with Swin-Base) and ADE20K semantic segmentation (51.36 mIoU with Swin-Large).
Achieves faster convergence compared to other delta-tuning methods.
Demonstrates the potential for adapter-tuning to replace full fine-tuning across various visual tasks.

Maintenance & Community

The project is associated with the CVPR 2025 conference. Key contributors include Dongshuo Yin, Leiyi Hu, and Bin Li. It acknowledges contributions from Swin-Transformer, mmclassification, NOAH, LoRA, and Adaptformer.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a CVPR 2025 submission, suggesting it may be in a research or early-stage development phase. Specific limitations or unsupported platforms are not detailed in the README.

Health Check

Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

105 stars in the last 90 days