Discover and explore top open-source AI tools and projects—updated daily.
VPGTransResearch paper for transferring visual prompt generators across LLMs
Top 95.6% on SourcePulse
This repository provides VPGTrans, a framework for efficiently transferring Visual Prompt Generators (VPGs) across Large Language Models (LLMs) to create Vision-Language LLMs (VL-LLMs). It targets researchers and developers aiming to build VL-LLMs with significantly reduced computational costs and data requirements, enabling customization with newly released LLMs.
How It Works
VPGTrans employs a two-stage training process to transfer a VPG from a source LLM (e.g., BLIP-2 OPT-6.7B) to a target LLM (e.g., LLaMA, Vicuna). This approach bypasses the need for expensive end-to-end pre-training from scratch. The framework facilitates the creation of new VL-LLMs by adapting existing LLMs with visual capabilities, offering a more feasible paradigm for VL-LLM development.
Quick Start & Requirements
pip install -r requirements.txt followed by pip install -e .python webui_demo.py --cfg-path lavis/projects/blip2/demo/vl_vicuna_demo.yaml --gpu-id 0Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README indicates that specific Vicuna weights (v0 version of Vicuna-7B) are required for the VL-Vicuna demo. Training scripts require careful configuration of dataset paths and checkpoint locations.
2 years ago
Inactive
NExT-GPT