mvits_for_class_agnostic_od by mmaaz60

Research paper for class-agnostic object detection

Created 4 years ago

315 stars

Top 85.8% on SourcePulse

Project Summary

This repository provides the official implementation for "Class-agnostic Object Detection with Multi-modal Transformer" (ECCV 2022). It addresses the limitations of traditional object detection methods in scaling to new domains and novel objects by leveraging multi-modal Vision Transformers (MViTs) trained with aligned image-text pairs. The primary audience is researchers and practitioners in computer vision, particularly those working on open-world object detection, salient object detection, and self-supervised detection tasks. The key benefit is achieving state-of-the-art performance in localizing generic objects, even those unseen during training, with enhanced interactability through language queries.

How It Works

The project utilizes Multi-modal Vision Transformers (MViTs), specifically proposing a novel architecture called Multiscale Attention ViT with Late fusion (MAVL). This approach integrates multi-scale feature processing and late vision-language fusion, departing from standard MViTs that often lack multi-scale capabilities and require longer training. The MAVL architecture employs multi-scale deformable attention, enabling it to capture richer object representations. By aligning image-text pairs during training, the MViTs learn to bridge the gap between object and image-level representations, facilitating class-agnostic detection.

Quick Start & Requirements

Installation:

Install PyTorch 1.8.0 with CUDA 11.1:

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

Install other dependencies:
```
pip install -r requirements.txt
```
Compile Deformable Attention modules:
```
cd models/ops
sh make.sh
```

Prerequisites: PyTorch 1.8.0, torchvision 0.9.0, CUDA 11.1.
Resources: Pre-trained models for MAVL, Def-DETR, MDETR, DETReg, Faster-RCNN, RetinaNet, ORE, and others are available. Instructions to reproduce results are provided.
Links: Paper, Training, Applications, Evaluation.

Highlighted Details

Demonstrates state-of-the-art class-agnostic object detection performance across various datasets and out-of-domain scenarios.
Shows consistent generalization to new domains and rare/novel classes, even with limited or no prior exposure.
Offers enhanced interactability by adapting proposals based on specific language queries.
Explores the importance of language structure in object detection through experimental analysis.
Enables open-world object detection by using MAVL proposals for pseudo-labeling.

Maintenance & Community

The project is associated with authors from MBZUAI. Contact emails are provided for inquiries. Related works are also linked.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

The installation requires a specific older version of PyTorch (1.8.0) and CUDA (11.1), which may pose compatibility challenges with newer systems. The README does not specify the license, which is a critical factor for adoption.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days