LAE-DINO  by jaychempan

Open-vocabulary object detection for Earth observation

Created 1 year ago
261 stars

Top 97.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary This project addresses the domain gap hindering open-vocabulary object detection (OVOD) in remote sensing. It introduces LAE-DINO, a novel OVOD foundation model, and the LAE-1M dataset, enabling the detection of any novel concepts on Earth for applications in Earth sciences and monitoring.

How It Works LAE-DINO builds upon the DINO architecture, incorporating two key innovations: Dynamic Vocabulary Construction (DVC) to adapt training vocabularies per batch, and Visual-Guided Text Prompt Learning (VisGT) to enhance visual-text feature alignment. These modules, trained on the large-scale LAE-1M dataset—the first comprehensive OVOD dataset for remote sensing—overcome the limitations of models trained on natural image domains.

Quick Start & Requirements Installation relies on mmdetection with Python 3.8, specific PyTorch (e.g., 1.10.0+cu113), mmcv, and project dependencies (requirements/multimodal.txt). Dataset creation uses the LAE-Label Engine (SAM/InternVL-based). Prerequisites include CUDA 11.3 (implied), BERT weights, and the LAE-1M dataset (available via HuggingFace, Baidu, or OneDrive). The paper is available on arXiv (2408.09110).

Highlighted Details

  • LAE-1M Dataset: A novel, large-scale dataset specifically curated for remote sensing OVOD.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
23 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.