VPGTrans  by VPGTrans

Research paper for transferring visual prompt generators across LLMs

Created 2 years ago
270 stars

Top 95.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides VPGTrans, a framework for efficiently transferring Visual Prompt Generators (VPGs) across Large Language Models (LLMs) to create Vision-Language LLMs (VL-LLMs). It targets researchers and developers aiming to build VL-LLMs with significantly reduced computational costs and data requirements, enabling customization with newly released LLMs.

How It Works

VPGTrans employs a two-stage training process to transfer a VPG from a source LLM (e.g., BLIP-2 OPT-6.7B) to a target LLM (e.g., LLaMA, Vicuna). This approach bypasses the need for expensive end-to-end pre-training from scratch. The framework facilitates the creation of new VL-LLMs by adapting existing LLMs with visual capabilities, offering a more feasible paradigm for VL-LLM development.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt followed by pip install -e .
  • Prerequisites: Requires pre-trained weights for LLaMA or Vicuna models. Specific BLIP-2 OPT-6.7B checkpoints are also needed for training.
  • Demo: python webui_demo.py --cfg-path lavis/projects/blip2/demo/vl_vicuna_demo.yaml --gpu-id 0
  • Resources: Training involves multiple stages and requires significant GPU resources and datasets (COCO caption, SBU, VG caption, Laion-COCO).

Highlighted Details

  • Enables VL-LLM creation with over 10x reduction in GPU hours and 10% data reduction compared to training from scratch.
  • Releases two novel VL-LLMs: VL-LLaMA (based on LLaMA) and VL-Vicuna (a GPT-4-like multimodal chatbot based on Vicuna).
  • Built upon the Lavis framework, leveraging its components for vision-language tasks.

Maintenance & Community

  • The project was accepted to NeurIPS 2023.
  • Code was released in May 2023.
  • The project is built upon Lavis and inspired by MiniGPT-4.

Licensing & Compatibility

  • License: BSD 3-Clause License.
  • Compatible with commercial use and closed-source linking as per the permissive BSD license.

Limitations & Caveats

The README indicates that specific Vicuna weights (v0 version of Vicuna-7B) are required for the VL-Vicuna demo. Training scripts require careful configuration of dataset paths and checkpoint locations.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.2%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 8 months ago
Feedback? Help us improve.