viper  by cvlab-columbia

ViperGPT: Visual inference via Python execution

created 2 years ago
1,705 stars

Top 25.5% on sourcepulse

GitHubView on GitHub
Project Summary

ViperGPT enables visual reasoning by generating and executing Python code using large language models. It's designed for researchers and developers working with multimodal AI, offering a framework to bridge visual understanding with programmatic problem-solving.

How It Works

ViperGPT leverages LLMs (GPT-3.5 Turbo, GPT-4) to interpret visual inputs and generate Python code for analysis. This code interacts with various vision models (e.g., GLIP, BLIP2) to perform tasks like object detection, segmentation, and captioning. The generated code can be executed directly or reviewed manually, providing a flexible and powerful approach to visual inference.

Quick Start & Requirements

  • Install via git clone --recurse-submodules and running bash setup.sh within the cloned directory.
  • Requires CUDA, Python, and a Conda environment (setup_env.sh).
  • Manual download of two pretrained models is necessary; others download automatically.
  • An OpenAI API key is required, placed in api.key.
  • Official docs: https://github.com/cvlab-columbia/viper

Highlighted Details

  • Supports GPT-3.5 Turbo and GPT-4, with notes on potential differences from the discontinued Codex API.
  • Offers a multiprocessing architecture for parallel model and sample execution.
  • Includes a flexible configuration system via YAML files.
  • Provides main_simple.ipynb for interactive exploration and main_batch.py for dataset processing.

Maintenance & Community

  • Developed by cvlab-columbia.
  • Citation details provided for academic use.

Licensing & Compatibility

  • No explicit license is mentioned in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

ViperGPT executes LLM-generated code, posing potential security risks; users are advised to run in sandboxed environments. The project notes that GPT-3.5/GPT-4 are chat models and their behavior may differ from completion models. Pretrained models may contain biases.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), Nathan Lambert Nathan Lambert(AI Researcher at AI2), and
1 more.

unified-io-2 by allenai

0.3%
619
Unified-IO 2 code for training, inference, and demo
created 1 year ago
updated 1 year ago
Feedback? Help us improve.