Compositional-Visual-Reasoning-Survey  by pokerme7777

Advancing compositional visual reasoning

Created 2 months ago
284 stars

Top 92.0% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository hosts a survey paper, "Explain Before You Answer: A Survey on Compositional Visual Reasoning," which systematically reviews and categorizes research in compositional visual reasoning. It targets researchers and engineers seeking a structured understanding of this rapidly evolving AI subfield, offering a roadmap of key shifts and methodologies.

How It Works

The survey highlights a fundamental shift from monolithic visual reasoning to compositional approaches. It organizes existing research into five distinct stages, detailing the evolution from prompt-enhanced language-centric models to tool-enhanced LLMs, tool-enhanced VLMs, Chain-of-Thought VLMs, and finally, unified agentic vision-language models. This staged overview provides a clear progression of architectural and methodological advancements.

Quick Start & Requirements

This repository contains a survey paper and does not provide executable code or a software project. Therefore, standard quick start instructions, installation commands, or specific technical requirements are not applicable.

Highlighted Details

  • The survey maps the progression of compositional visual reasoning, emphasizing the move towards more integrated and sophisticated multimodal AI systems.
  • It meticulously categorizes over 100 research papers across five stages, each representing a distinct paradigm in visual reasoning.
  • Many listed papers include direct links to their code repositories, facilitating deeper exploration of specific methodologies.

Maintenance & Community

The project actively encourages community contributions to expand and update the survey's catalog of relevant research papers. Interested parties can open an issue to suggest new papers for inclusion, fostering a collaborative effort to maintain the survey's comprehensiveness.

Licensing & Compatibility

No specific software license is mentioned for the survey content or any associated code links within the provided README text. Users should verify licensing for individual cited works.

Limitations & Caveats

As a survey paper, this resource provides a curated overview rather than a deployable tool. Its content reflects the state of research up to its publication date (2025), and the rapidly advancing field of AI may have seen further developments since. It serves as a reference and roadmap, not a direct implementation.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
144 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.