Awesome-Visual-Grounding  by linhuixiao

Visual grounding research survey and resource hub

Created 1 year ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository serves as a comprehensive, curated collection of resources for the field of Visual Grounding, directly supporting a TPAMI survey paper. It aims to systematically track and summarize a decade of research in visual grounding, including referring expression comprehension and phrase grounding. The project benefits researchers and practitioners by providing an organized overview of methods, applications, datasets, and emerging trends, facilitating staying up-to-date and identifying future research directions.

How It Works

This repository functions as a dynamic catalog, complementing a detailed survey paper that systematically reviews the evolution of visual grounding techniques. It organizes research by methodology (CNN-based, Transformer-based, VLP-based, LLM-grounding), task setting (supervised, weakly supervised, zero-shot), applications, and datasets. The project actively encourages community contributions to ensure comprehensive coverage and ongoing updates, with a roadmap for future survey versions.

Quick Start & Requirements

This repository is a collection of research resources, not a standalone software project. To utilize specific methods, users must refer to the individual papers and linked code repositories within the survey. Requirements will vary significantly based on the chosen method, potentially including specific Python versions, deep learning frameworks (PyTorch, TensorFlow), GPU hardware, and large datasets. The README encourages contributions via pull requests and issues. The next survey update is expected by June 1, 2025.

Highlighted Details

  • Provides a comprehensive survey covering traditional and emerging areas like Grounding Multimodal LLMs and Generalized Visual Grounding.
  • Includes detailed results and comparisons for representative works, aiming for fairness and clarity.
  • Features links to papers and code implementations for numerous visual grounding methods.
  • Offers insights into future research directions and challenges within the field.

Maintenance & Community

The project is actively maintained, with a stated goal of updating the survey by June 1, 2025. It explicitly welcomes community contributions through pull requests and issues for missing papers, implementations, or suggestions. The primary contact is via email (xiaolinhui16@ucas.ac.cn). Several individuals are acknowledged for their contributions.

Licensing & Compatibility

The repository itself does not specify a license. The survey paper is currently under review for TPAMI. Licenses for individual code repositories linked within the survey will vary and must be checked separately. Compatibility for commercial use or closed-source linking depends entirely on the licenses of the individual projects referenced.

Limitations & Caveats

The README acknowledges that due to the vastness of the field, the survey cannot cover every single paper ("sorry to cover all in our survey"). The primary survey paper is still under review, indicating it may undergo revisions. As a curated list, the repository's comprehensiveness relies on ongoing community contributions and the authors' efforts.

Health Check
Last Commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.