VIGA by Fugtemypt123

Vision-as-Inverse-Graphics agent for programmatic visual reconstruction

Created 8 months ago

816 stars

Top 43.3% on SourcePulse

Project Summary

Summary

VIGA (Vision-as-Inverse-Graphics Agent) is a programmatic visual reconstruction agent for complex scene generation and editing. It targets researchers and power users, employing an analysis-by-synthesis approach for iterative refinement without finetuning. The agent's self-correcting loop generates, renders, and verifies scenes against targets, offering a robust solution for programmatic visual tasks.

How It Works

VIGA functions as a self-reflective agent alternating between Generator and Verifier roles. The Generator writes and executes scene programs using tools for planning, code execution, asset retrieval, and scene queries. The Verifier renders scenes from multiple viewpoints, identifies discrepancies, and provides feedback for revision. This iterative write-run-compare-revise loop is self-correcting and requires no finetuning, maintaining an evolving contextual memory.

Quick Start & Requirements

Installation requires Conda; an NVIDIA GPU with CUDA is recommended for 3D modes.

Clone Repo: git clone https://github.com/Fugtemypt123/VIGA-release.git && cd VIGA-release
Download SAM: wget -P utils/third_party/sam https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
Conda Environments: Set up separate environments for agent (Python 3.10), blender (Python 3.11), sam (Python 3.10), and sam3d (Python 3.11) with specified requirements.
API Keys: Configure utils/_api_keys.py with OPENAI_API_KEY and MESHY_API_KEY.
Paths: Edit utils/_path.py to set CONDA_BASE.
Usage: conda activate agent then python runners/dynamic_scene.py --task=artist --model=gpt-5.
- Docs: Architecture, Requirements, Runners sections are linked in README.
- Paper: Available on arXiv.

Highlighted Details

Supports diverse domains: BlenderBench (3D editing), BlenderGym (single-step 3D editing), SlideBench (2D layout synthesis), and custom static/dynamic scene reconstruction.
Employs an "analysis-by-synthesis" paradigm for visual reconstruction.
Core iterative loop is self-correcting and requires no finetuning.
Integrates tools for planning, code execution, asset retrieval, and scene queries.

Maintenance & Community

The README provides no details on contributors, sponsorships, community channels, or a public roadmap.

Licensing & Compatibility

The README omits license information, preventing an assessment of compatibility for commercial use or closed-source linking.

Limitations & Caveats

Setup is complex, requiring multiple Conda environments and specific dependencies. Users must provide API keys for OpenAI and Meshy. An NVIDIA GPU with CUDA is recommended for 3D tasks. The absence of a specified license is a significant adoption caveat.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

269 stars in the last 30 days