LaVi-Bridge by ShihaoZhaoZSH

Text-to-image generation research paper

Created 1 year ago

298 stars

Top 89.2% on SourcePulse

Project Summary

LaVi-Bridge enables flexible text-to-image generation by integrating diverse pre-trained language models (LLMs) with generative vision models. It targets researchers and practitioners in AI and computer vision who want to experiment with novel LLM-vision model combinations for image synthesis without modifying base model weights. The primary benefit is a plug-and-play framework that leverages LoRA and adapters for seamless integration.

How It Works

LaVi-Bridge acts as an intermediary layer, connecting various LLMs to diffusion-based vision models. It utilizes LoRA (Low-Rank Adaptation) and adapter modules to inject the LLM's understanding into the vision model's generation process. This approach avoids fine-tuning the entire LLM or vision model, making integration efficient and preserving the original model capabilities.

Quick Start & Requirements

Install via conda env create -f environment.yaml and conda activate lavi-bridge.
Requires pre-trained LoRA/adapters (download link provided).
For Llama-2 integration, download Llama-2-7b weights and update run.sh with --llama2_dir.
Official project page and paper (ArXiv) links are available.

Highlighted Details

Supports combinations like T5-Large + U-Net(SD) and Llama-2 + U-Net(SD).
Offers training scripts for custom datasets, recommending COCO2017 and JourneyDB.
Allows experimentation with different LLMs (T5 variants, Llama-2) and vision backbones (U-Net, Transformer).

Maintenance & Community

Project is associated with ECCV 2024.
Built upon another repository (link provided).

Licensing & Compatibility

License type is not explicitly stated in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license, which may impact commercial adoption. While flexible, the setup requires downloading separate pre-trained weights for LLMs and adapters, adding to the initial resource footprint.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days