LaVi-Bridge  by ShihaoZhaoZSH

Text-to-image generation research paper

created 1 year ago
295 stars

Top 90.7% on sourcepulse

GitHubView on GitHub
Project Summary

LaVi-Bridge enables flexible text-to-image generation by integrating diverse pre-trained language models (LLMs) with generative vision models. It targets researchers and practitioners in AI and computer vision who want to experiment with novel LLM-vision model combinations for image synthesis without modifying base model weights. The primary benefit is a plug-and-play framework that leverages LoRA and adapters for seamless integration.

How It Works

LaVi-Bridge acts as an intermediary layer, connecting various LLMs to diffusion-based vision models. It utilizes LoRA (Low-Rank Adaptation) and adapter modules to inject the LLM's understanding into the vision model's generation process. This approach avoids fine-tuning the entire LLM or vision model, making integration efficient and preserving the original model capabilities.

Quick Start & Requirements

  • Install via conda env create -f environment.yaml and conda activate lavi-bridge.
  • Requires pre-trained LoRA/adapters (download link provided).
  • For Llama-2 integration, download Llama-2-7b weights and update run.sh with --llama2_dir.
  • Official project page and paper (ArXiv) links are available.

Highlighted Details

  • Supports combinations like T5-Large + U-Net(SD) and Llama-2 + U-Net(SD).
  • Offers training scripts for custom datasets, recommending COCO2017 and JourneyDB.
  • Allows experimentation with different LLMs (T5 variants, Llama-2) and vision backbones (U-Net, Transformer).

Maintenance & Community

  • Project is associated with ECCV 2024.
  • Built upon another repository (link provided).

Licensing & Compatibility

  • License type is not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license, which may impact commercial adoption. While flexible, the setup requires downloading separate pre-trained weights for LLMs and adapters, adding to the initial resource footprint.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

LLaMA-Adapter by OpenGVLab

0.0%
6k
Efficient fine-tuning for instruction-following LLaMA models
created 2 years ago
updated 1 year ago
Feedback? Help us improve.