InstructCV  by AlaaLab

PyTorch code for a vision generalist research paper

created 2 years ago
462 stars

Top 66.5% on sourcepulse

GitHubView on GitHub
Project Summary

InstructCV provides an official PyTorch implementation for instruction-tuned text-to-image diffusion models, enabling them to act as generalist vision models. It addresses the limitations of specialized architectures in computer vision by framing tasks like segmentation, object detection, and depth estimation as text-to-image generation problems, allowing natural language instructions to guide task execution.

How It Works

The approach casts various computer vision tasks into a text-to-image generation framework. Instructions, paraphrased by a large language model, are paired with input images and task-specific outputs to create a multi-modal dataset. This dataset is then used to instruction-tune a diffusion model, similar to InstructPix2Pix, transforming it into a versatile, instruction-guided vision learner. This method offers a unified language interface, abstracting away task-specific design choices.

Quick Start & Requirements

  • Install dependencies via conda env create -f environment.yaml and conda activate lvi.
  • Optional: Install TensorFlow, mmcv-full, and mmdetection following provided instructions.
  • See Preparing Datasets and Getting Started for detailed instructions.
  • Requires PyTorch 1.5+.

Highlighted Details

  • Achieves competitive performance on tasks including depth estimation, semantic segmentation, classification, and object detection.
  • Leverages instruction-tuning on a diffusion model, adapting it for multi-task visual recognition.
  • Integrates with Hugging Face Spaces for a web demo.
  • Based on CompVis/stable_diffusion and Instruct Pix2Pix architectures.

Maintenance & Community

  • Official PyTorch implementation.
  • Codebase is largely based on CompVis/stable_diffusion and Instruct Pix2Pix.
  • Citation details provided for academic use.

Licensing & Compatibility

  • The pre-trained model for Stable Diffusion is subject to its original license terms.
  • Compatibility with commercial use or closed-source linking depends on the underlying Stable Diffusion license.

Limitations & Caveats

The project's setup involves several external dependencies and specific installation steps for baseline models, which may require significant configuration time. The licensing of the pre-trained Stable Diffusion model may impose restrictions on commercial use.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.