Discover and explore top open-source AI tools and projects—updated daily.
jaychempanOpen-vocabulary object detection for Earth observation
Top 97.4% on SourcePulse
Summary This project addresses the domain gap hindering open-vocabulary object detection (OVOD) in remote sensing. It introduces LAE-DINO, a novel OVOD foundation model, and the LAE-1M dataset, enabling the detection of any novel concepts on Earth for applications in Earth sciences and monitoring.
How It Works LAE-DINO builds upon the DINO architecture, incorporating two key innovations: Dynamic Vocabulary Construction (DVC) to adapt training vocabularies per batch, and Visual-Guided Text Prompt Learning (VisGT) to enhance visual-text feature alignment. These modules, trained on the large-scale LAE-1M dataset—the first comprehensive OVOD dataset for remote sensing—overcome the limitations of models trained on natural image domains.
Quick Start & Requirements
Installation relies on mmdetection with Python 3.8, specific PyTorch (e.g., 1.10.0+cu113), mmcv, and project dependencies (requirements/multimodal.txt). Dataset creation uses the LAE-Label Engine (SAM/InternVL-based). Prerequisites include CUDA 11.3 (implied), BERT weights, and the LAE-1M dataset (available via HuggingFace, Baidu, or OneDrive). The paper is available on arXiv (2408.09110).
Highlighted Details
6 days ago
Inactive