Research paper for class-agnostic object detection
Top 87.5% on sourcepulse
This repository provides the official implementation for "Class-agnostic Object Detection with Multi-modal Transformer" (ECCV 2022). It addresses the limitations of traditional object detection methods in scaling to new domains and novel objects by leveraging multi-modal Vision Transformers (MViTs) trained with aligned image-text pairs. The primary audience is researchers and practitioners in computer vision, particularly those working on open-world object detection, salient object detection, and self-supervised detection tasks. The key benefit is achieving state-of-the-art performance in localizing generic objects, even those unseen during training, with enhanced interactability through language queries.
How It Works
The project utilizes Multi-modal Vision Transformers (MViTs), specifically proposing a novel architecture called Multiscale Attention ViT with Late fusion (MAVL). This approach integrates multi-scale feature processing and late vision-language fusion, departing from standard MViTs that often lack multi-scale capabilities and require longer training. The MAVL architecture employs multi-scale deformable attention, enabling it to capture richer object representations. By aligning image-text pairs during training, the MViTs learn to bridge the gap between object and image-level representations, facilitating class-agnostic detection.
Quick Start & Requirements
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
cd models/ops
sh make.sh
Highlighted Details
Maintenance & Community
The project is associated with authors from MBZUAI. Contact emails are provided for inquiries. Related works are also linked.
Licensing & Compatibility
The repository does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source linking.
Limitations & Caveats
The installation requires a specific older version of PyTorch (1.8.0) and CUDA (11.1), which may pose compatibility challenges with newer systems. The README does not specify the license, which is a critical factor for adoption.
2 years ago
1 day