PyTorch text-to-image generation framework
Top 90.3% on SourcePulse
This repository provides a PyTorch reimplementation of CrossFlow, a text-to-image generation framework designed for noise-free cross-modality evolution. It targets researchers and practitioners in computer vision and generative AI, offering flexibility in model architecture, language models, and training datasets compared to the original paper.
How It Works
CrossFlow utilizes a diffusion model architecture, supporting both DiT and the state-of-the-art DiMR. It processes text prompts through language models like CLIP or T5-XXL, generating images by evolving latent representations. This approach aims for a noise-free generation process, enabling smooth interpolations and arithmetic operations in the latent space for creative image manipulation.
Quick Start & Requirements
pip3 install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
, pip3 install -U --pre triton
, and pip3 install -r requirements.txt
.Highlighted Details
Maintenance & Community
The project is associated with CVPR 2025 and lists Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, and Mannat Singh as contributors. Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
The project is created for research purposes. The specific license is not stated, but the research-focused nature may imply restrictions on commercial use.
Limitations & Caveats
T5-XXL models fine-tuned on JourneyDB may exhibit minor text-image misalignment compared to models trained from scratch. Linear interpolation sampling is currently limited to a single GPU.
2 months ago
Inactive