GeoVista by ekonwang

Web-augmented agentic visual reasoning for geolocalization

Created 6 months ago

268 stars

Top 95.6% on SourcePulse

Project Summary

Summary

GeoVista addresses the challenge of precise geolocalization by integrating visual reasoning with web augmentation. Designed for researchers and developers in geospatial AI, it offers a robust framework for agentic models to determine locations more accurately by leveraging both image analysis and real-time web search.

How It Works

The system employs an agentic approach, where a model processes visual input and dynamically queries web search APIs (like Tavily) to gather contextual information. This web-augmented data is then used for reasoning. Training involves a two-stage pipeline: initial Cold-Start supervised fine-tuning (SFT) followed by Reinforcement Learning (RL), enabling the model to learn complex geolocalization strategies.

Quick Start & Requirements

Installation: Setup involves creating a Conda environment (python==3.10), activating it, and running bash setup.sh.
Prerequisites: Requires a Tavily API key (configured via .env), vllm for deployment (implying GPU/CUDA support), and Python 3.10.
Model Deployment: Download pre-trained models from HuggingFace (e.g., LibraTree/GeoVista-RL-6k-7B) and deploy using vllm via provided scripts.
Example: Run examples/infer_example.py with a sample image and question to test inference.
Resources: Setup involves environment configuration, API key integration, model download, and vllm deployment.

Highlighted Details

GeoVista-Bench (GeoBench) Dataset: A novel, high-resolution, multi-source, globally annotated benchmark for evaluating agentic geolocalization models.
Comprehensive Evaluation Metrics: GeoBench assesses models across Global Coverage (GC), Reasonable Localizability (RC), High Resolution (HR), Data Variety (DV), and Nuanced Evaluation (NE).
Pre-trained Models: Offers access to tuned models like GeoVista-RL-6k-7B and GeoVista-RL-12k-7B on HuggingFace.
Full Pipelines: Provides end-to-end scripts for inference and evaluation on GeoBench.

Maintenance & Community

The project acknowledges support from Tavily and Google Cloud for services. No explicit community channels (e.g., Discord, Slack) or detailed contributor information are provided in the README.

Licensing & Compatibility

This repository is explicitly stated to be "intended solely for research purposes." No specific open-source license (like MIT, Apache) is mentioned, and the research-only clause strongly restricts commercial use or integration into closed-source products.

Limitations & Caveats

The primary limitation is its explicit designation for research use only, precluding direct adoption for commercial or production applications. The setup requires specific API keys and a vllm deployment environment, which may pose integration challenges.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days