GeoVista  by ekonwang

Web-augmented agentic visual reasoning for geolocalization

Created 3 months ago
257 stars

Top 98.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

GeoVista addresses the challenge of precise geolocalization by integrating visual reasoning with web augmentation. Designed for researchers and developers in geospatial AI, it offers a robust framework for agentic models to determine locations more accurately by leveraging both image analysis and real-time web search.

How It Works

The system employs an agentic approach, where a model processes visual input and dynamically queries web search APIs (like Tavily) to gather contextual information. This web-augmented data is then used for reasoning. Training involves a two-stage pipeline: initial Cold-Start supervised fine-tuning (SFT) followed by Reinforcement Learning (RL), enabling the model to learn complex geolocalization strategies.

Quick Start & Requirements

  • Installation: Setup involves creating a Conda environment (python==3.10), activating it, and running bash setup.sh.
  • Prerequisites: Requires a Tavily API key (configured via .env), vllm for deployment (implying GPU/CUDA support), and Python 3.10.
  • Model Deployment: Download pre-trained models from HuggingFace (e.g., LibraTree/GeoVista-RL-6k-7B) and deploy using vllm via provided scripts.
  • Example: Run examples/infer_example.py with a sample image and question to test inference.
  • Resources: Setup involves environment configuration, API key integration, model download, and vllm deployment.

Highlighted Details

  • GeoVista-Bench (GeoBench) Dataset: A novel, high-resolution, multi-source, globally annotated benchmark for evaluating agentic geolocalization models.
  • Comprehensive Evaluation Metrics: GeoBench assesses models across Global Coverage (GC), Reasonable Localizability (RC), High Resolution (HR), Data Variety (DV), and Nuanced Evaluation (NE).
  • Pre-trained Models: Offers access to tuned models like GeoVista-RL-6k-7B and GeoVista-RL-12k-7B on HuggingFace.
  • Full Pipelines: Provides end-to-end scripts for inference and evaluation on GeoBench.

Maintenance & Community

The project acknowledges support from Tavily and Google Cloud for services. No explicit community channels (e.g., Discord, Slack) or detailed contributor information are provided in the README.

Licensing & Compatibility

This repository is explicitly stated to be "intended solely for research purposes." No specific open-source license (like MIT, Apache) is mentioned, and the research-only clause strongly restricts commercial use or integration into closed-source products.

Limitations & Caveats

The primary limitation is its explicit designation for research use only, precluding direct adoption for commercial or production applications. The setup requires specific API keys and a vllm deployment environment, which may pose integration challenges.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
10 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.