viper by cvlab-columbia

ViperGPT: Visual inference via Python execution

Created 2 years ago

1,711 stars

Top 24.6% on SourcePulse

View on GitHub

5 Experts Love This Project

Andreas Jansson

Cofounder of Replicate

Gabriel Almeida

Cofounder of Langflow

and 1 more!

Project Summary

ViperGPT enables visual reasoning by generating and executing Python code using large language models. It's designed for researchers and developers working with multimodal AI, offering a framework to bridge visual understanding with programmatic problem-solving.

How It Works

ViperGPT leverages LLMs (GPT-3.5 Turbo, GPT-4) to interpret visual inputs and generate Python code for analysis. This code interacts with various vision models (e.g., GLIP, BLIP2) to perform tasks like object detection, segmentation, and captioning. The generated code can be executed directly or reviewed manually, providing a flexible and powerful approach to visual inference.

Quick Start & Requirements

Install via git clone --recurse-submodules and running bash setup.sh within the cloned directory.
Requires CUDA, Python, and a Conda environment (setup_env.sh).
Manual download of two pretrained models is necessary; others download automatically.
An OpenAI API key is required, placed in api.key.
Official docs: https://github.com/cvlab-columbia/viper

Highlighted Details

Supports GPT-3.5 Turbo and GPT-4, with notes on potential differences from the discontinued Codex API.
Offers a multiprocessing architecture for parallel model and sample execution.
Includes a flexible configuration system via YAML files.
Provides main_simple.ipynb for interactive exploration and main_batch.py for dataset processing.

Maintenance & Community

Developed by cvlab-columbia.
Citation details provided for academic use.

Licensing & Compatibility

No explicit license is mentioned in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

ViperGPT executes LLM-generated code, posing potential security risks; users are advised to run in sandboxed environments. The project notes that GPT-3.5/GPT-4 are chat models and their behavior may differ from completion models. Pretrained models may contain biases.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days