Discover and explore top open-source AI tools and projects—updated daily.
ekonwangWeb-augmented agentic visual reasoning for geolocalization
Top 98.3% on SourcePulse
Summary
GeoVista addresses the challenge of precise geolocalization by integrating visual reasoning with web augmentation. Designed for researchers and developers in geospatial AI, it offers a robust framework for agentic models to determine locations more accurately by leveraging both image analysis and real-time web search.
How It Works
The system employs an agentic approach, where a model processes visual input and dynamically queries web search APIs (like Tavily) to gather contextual information. This web-augmented data is then used for reasoning. Training involves a two-stage pipeline: initial Cold-Start supervised fine-tuning (SFT) followed by Reinforcement Learning (RL), enabling the model to learn complex geolocalization strategies.
Quick Start & Requirements
python==3.10), activating it, and running bash setup.sh..env), vllm for deployment (implying GPU/CUDA support), and Python 3.10.LibraTree/GeoVista-RL-6k-7B) and deploy using vllm via provided scripts.examples/infer_example.py with a sample image and question to test inference.vllm deployment.Highlighted Details
GeoVista-RL-6k-7B and GeoVista-RL-12k-7B on HuggingFace.Maintenance & Community
The project acknowledges support from Tavily and Google Cloud for services. No explicit community channels (e.g., Discord, Slack) or detailed contributor information are provided in the README.
Licensing & Compatibility
This repository is explicitly stated to be "intended solely for research purposes." No specific open-source license (like MIT, Apache) is mentioned, and the research-only clause strongly restricts commercial use or integration into closed-source products.
Limitations & Caveats
The primary limitation is its explicit designation for research use only, precluding direct adoption for commercial or production applications. The setup requires specific API keys and a vllm deployment environment, which may pose integration challenges.
1 month ago
Inactive