LAE-DINO by jaychempan

Open-vocabulary object detection for Earth observation

Created 1 year ago

261 stars

Top 97.4% on SourcePulse

Project Summary

Summary This project addresses the domain gap hindering open-vocabulary object detection (OVOD) in remote sensing. It introduces LAE-DINO, a novel OVOD foundation model, and the LAE-1M dataset, enabling the detection of any novel concepts on Earth for applications in Earth sciences and monitoring.

How It Works LAE-DINO builds upon the DINO architecture, incorporating two key innovations: Dynamic Vocabulary Construction (DVC) to adapt training vocabularies per batch, and Visual-Guided Text Prompt Learning (VisGT) to enhance visual-text feature alignment. These modules, trained on the large-scale LAE-1M dataset—the first comprehensive OVOD dataset for remote sensing—overcome the limitations of models trained on natural image domains.

Quick Start & Requirements Installation relies on mmdetection with Python 3.8, specific PyTorch (e.g., 1.10.0+cu113), mmcv, and project dependencies (requirements/multimodal.txt). Dataset creation uses the LAE-Label Engine (SAM/InternVL-based). Prerequisites include CUDA 11.3 (implied), BERT weights, and the LAE-1M dataset (available via HuggingFace, Baidu, or OneDrive). The paper is available on arXiv (2408.09110).

LAE-DINO by jaychempan

Explore Similar Projects

Remote-Sensing-in-CVPR2024 by rsdler

RS5M by om-ai-lab

awesome-vision-language-models-for-earth-observation by geoaigroup

awesome-described-object-detection by Charles-Xie

Awesome-Visual-Grounding by linhuixiao

mvits_for_class_agnostic_od by mmaaz60

OV-DINO by wanghao9610

Awesome-Open-Vocabulary-Semantic-Segmentation by Qinying-Liu

Awesome-Open-Vocabulary by jianzongwu

DINO-X-API by IDEA-Research

GeoChat by mbzuai-oryx

techniques by satellite-image-deep-learning