Local VQGAN+CLIP tool for text-to-image generation
Top 18.1% on sourcepulse
This repository provides a local implementation of VQGAN+CLIP, a generative art model that synthesizes images from text prompts. It targets artists, researchers, and hobbyists seeking to run advanced AI image generation without relying on cloud platforms like Google Colab. The primary benefit is enabling local, customizable control over the VQGAN+CLIP pipeline.
How It Works
The project leverages the VQGAN architecture for image encoding and the CLIP model for guiding the generation process based on text descriptions. It combines these components to iteratively refine an image, starting from noise or an initial image, to match the semantic meaning of the provided text prompts. This approach allows for high-fidelity image synthesis guided by natural language.
Quick Start & Requirements
conda create --name vqgan python=3.9
, conda activate vqgan
), install PyTorch with CUDA 11.1 (pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
), and then install other dependencies (pip install -r requirements.txt
). Clone required repositories: git clone https://github.com/openai/CLIP
and git clone https://github.com/CompVis/taming-transformers
. Download VQGAN checkpoints to a checkpoints/
directory.Highlighted Details
Maintenance & Community
The project is a personal exploration by "nerdyrodent" and does not indicate a formal maintenance team or community channels like Discord/Slack.
Licensing & Compatibility
The repository itself does not explicitly state a license. However, it depends on CLIP (MIT License) and Taming Transformers (MIT License). VQGAN models are typically released under permissive licenses, allowing for commercial use and integration into closed-source projects.
Limitations & Caveats
AMD GPU support is experimental and requires ROCm installation. CPU-only generation is possible but significantly slower. The project is presented as a personal experiment, implying potential for breaking changes or lack of long-term support. CUDA out-of-memory errors are common for larger resolutions or higher cut counts.
2 years ago
Inactive