visprog  by allenai

Neuro-symbolic system for compositional visual reasoning using natural language

Created 2 years ago
746 stars

Top 46.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official code for VisProg, a neuro-symbolic system designed for compositional visual reasoning based on natural language instructions. It targets researchers and developers working on complex visual question answering and image manipulation tasks, offering an interpretable and extensible framework.

How It Works

VisProg leverages GPT-3's in-context learning to generate Python programs that execute off-the-shelf computer vision models and image processing routines. This approach allows for compositional reasoning without requiring task-specific training, generating both solutions and interpretable execution rationales. The system is modular, enabling easy extension with new functionalities and tasks.

Quick Start & Requirements

  • Install dependencies using conda env create -f environment.yaml and activate with conda activate visprog.
  • Run provided Jupyter notebooks (e.g., notebooks/ok_det.ipynb, notebooks/image_editing.ipynb, notebooks/nlvr.ipynb, notebooks/gqa.ipynb).
  • Requires an OpenAI API key.
  • Official project page: https://visprog.github.io/
  • Arxiv Paper: https://arxiv.org/abs/2211.11559

Highlighted Details

  • CVPR 2023 Best Paper award winner.
  • Neuro-symbolic approach for compositional visual reasoning.
  • Generates Python programs for execution, providing interpretable rationales.
  • Modular design allows easy addition of new modules and tasks.
  • Swappable vision modules (e.g., BLIP for VQA).

Maintenance & Community

  • The project is associated with Allen Institute for AI (AI2).
  • The README mentions a successor project, CodeNav, which addresses VisProg's limitations.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • Performance is dependent on GPT-3's program generation capabilities and may fail on instructions significantly different from in-context examples.
  • Tasks not solvable by the current set of modules require manual addition of new modules.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Andrew Ng Andrew Ng(Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

vision-agent by landing-ai

0.1%
5k
Visual AI agent for generating runnable vision code from image/video prompts
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.