Survey of remote sensing multimodal LLMs
Top 92.5% on sourcepulse
This repository serves as a comprehensive survey and curated collection of resources for Multimodal Large Language Models (MLLMs) applied to Remote Sensing (RS-MLLMs). It targets researchers and practitioners in the field, offering a centralized hub for the latest advancements, datasets, benchmarks, and intelligent agents, aiming to accelerate development and understanding in this specialized domain.
How It Works
The project functions as an "awesome list" style repository, meticulously gathering and categorizing papers, datasets, and benchmarks related to RS-MLLMs. It covers various aspects, including vision-language pre-training models, intelligent agents for remote sensing tasks, and comprehensive evaluation benchmarks for specific applications like image captioning, visual question answering, and image-text retrieval. The organization aims to provide a structured overview of the rapidly evolving landscape of RS-MLLMs.
Quick Start & Requirements
This repository is a curated list of research papers and resources, not a runnable software package. No installation or execution commands are applicable. The primary requirement is an interest in the field of remote sensing and multimodal large language models.
Highlighted Details
Maintenance & Community
The project is maintained by ZhanYang-nwpu and is updated in real-time to track the latest state of RS-MLLMs. Contact is available via zhanyangnwpu@gmail.com.
Licensing & Compatibility
The repository itself is a collection of links and information; licensing details would pertain to the individual linked research papers and their associated codebases, which are not specified here.
Limitations & Caveats
This repository is a survey and does not provide executable code or models. The "latest updates" section indicates that the first review manuscript was submitted for review in May 2024, suggesting the field is still nascent and rapidly evolving.
1 month ago
Inactive