PandaGPT  by yxuansu

Multimodal model for instruction following across six modalities

created 2 years ago
809 stars

Top 44.6% on sourcepulse

GitHubView on GitHub
Project Summary

PandaGPT is a multimodal instruction-following foundation model designed for researchers and power users. It enables a single model to process and respond to instructions across six modalities, including vision and audio, facilitating complex reasoning and cross-modal understanding.

How It Works

PandaGPT builds upon the ImageBind model and Vicuna language model, integrating them to create a unified instruction-following capability. It leverages delta weights for fine-tuning, allowing it to understand and generate responses based on combined visual and auditory inputs, composing their semantics naturally for tasks like detailed descriptions or story generation.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt and PyTorch with CUDA support (e.g., pip install torch==1.13.1+cu117).
  • Download checkpoints for ImageBind and Vicuna.
  • Download PandaGPT delta weights (e.g., openllmplayground/pandagpt_7b_max_len_1024).
  • Run demo: cd ./code/ && CUDA_VISIBLE_DEVICES=0 python web_demo.py.
  • For potential sample_rate issues, install pytorchvideo from source.
  • Official Demo: http://pandagpt.baai.ac.cn/
  • Paper: https://arxiv.org/abs/2305.16355

Highlighted Details

  • First foundation model for instruction-following across six modalities without explicit supervision.
  • Capable of complex multimodal understanding, reasoning, and knowledge-grounded generation.
  • Supports simultaneous processing of image and audio inputs for compositional semantics.
  • Offers pre-trained delta weights for Vicuna-7B and Vicuna-13B models.

Maintenance & Community

  • Major contributors are listed, with asterisks indicating primary contributors.
  • The project acknowledges contributions from OpenAlpaca, ImageBind, LLaVA, and MiniGPT-4.

Licensing & Compatibility

  • Intended and licensed for research use only.
  • Training dataset and delta weights are licensed under CC BY NC 4.0, restricting commercial use.

Limitations & Caveats

The project's dataset and delta weights are licensed under CC BY NC 4.0, strictly limiting usage to non-commercial, research purposes.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
27 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.