Vision-language dataset and model for remote sensing
Top 95.9% on sourcepulse
This repository provides the RS5M dataset and GeoRSCLIP, a vision-language foundation model tailored for remote sensing (RS) applications. It addresses the challenge of adapting general vision-language models (VLMs) to the specialized domain of remote sensing, enabling improved performance on downstream tasks like zero-shot classification, cross-modal retrieval, and semantic localization. The target audience includes researchers and practitioners in remote sensing, computer vision, and natural language processing.
How It Works
The project introduces RS5M, a 5-million image-text pair dataset for remote sensing, created by filtering existing datasets and using VLMs for captioning. GeoRSCLIP is a domain-adapted VLM, fine-tuned on RS5M using parameter-efficient fine-tuning (PEFT) methods. This approach bridges the gap between general VLMs and domain-specific tasks, offering improved transfer learning capabilities.
Quick Start & Requirements
git clone https://huggingface.co/Zilun/GeoRSCLIP
). Install PyTorch (tested with 2.0.1/CUDA 11.8 and 2.1.0/CUDA 12.1) and other dependencies via pip.python codebase/inference.py --ckpt-path <path_to_model> --test-dataset-dir <path_to_data>
. Model checkpoints and dataset links are available on Hugging Face.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
4 months ago
1 day