VPGTrans by VPGTrans

Research paper for transferring visual prompt generators across LLMs

Created 2 years ago

270 stars

Top 95.6% on SourcePulse

Project Summary

This repository provides VPGTrans, a framework for efficiently transferring Visual Prompt Generators (VPGs) across Large Language Models (LLMs) to create Vision-Language LLMs (VL-LLMs). It targets researchers and developers aiming to build VL-LLMs with significantly reduced computational costs and data requirements, enabling customization with newly released LLMs.

How It Works

VPGTrans employs a two-stage training process to transfer a VPG from a source LLM (e.g., BLIP-2 OPT-6.7B) to a target LLM (e.g., LLaMA, Vicuna). This approach bypasses the need for expensive end-to-end pre-training from scratch. The framework facilitates the creation of new VL-LLMs by adapting existing LLMs with visual capabilities, offering a more feasible paradigm for VL-LLM development.

Quick Start & Requirements

Installation: pip install -r requirements.txt followed by pip install -e .
Prerequisites: Requires pre-trained weights for LLaMA or Vicuna models. Specific BLIP-2 OPT-6.7B checkpoints are also needed for training.
Demo: python webui_demo.py --cfg-path lavis/projects/blip2/demo/vl_vicuna_demo.yaml --gpu-id 0
Resources: Training involves multiple stages and requires significant GPU resources and datasets (COCO caption, SBU, VG caption, Laion-COCO).

Highlighted Details

Enables VL-LLM creation with over 10x reduction in GPU hours and 10% data reduction compared to training from scratch.
Releases two novel VL-LLMs: VL-LLaMA (based on LLaMA) and VL-Vicuna (a GPT-4-like multimodal chatbot based on Vicuna).
Built upon the Lavis framework, leveraging its components for vision-language tasks.

Maintenance & Community

The project was accepted to NeurIPS 2023.
Code was released in May 2023.
The project is built upon Lavis and inspired by MiniGPT-4.

Licensing & Compatibility

License: BSD 3-Clause License.
Compatible with commercial use and closed-source linking as per the permissive BSD license.

Limitations & Caveats

The README indicates that specific Vicuna weights (v0 version of Vicuna-7B) are required for the VL-Vicuna demo. Training scripts require careful configuration of dataset paths and checkpoint locations.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days