CrossFlow  by qihao067

PyTorch text-to-image generation framework

created 7 months ago
292 stars

Top 90.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch reimplementation of CrossFlow, a text-to-image generation framework designed for noise-free cross-modality evolution. It targets researchers and practitioners in computer vision and generative AI, offering flexibility in model architecture, language models, and training datasets compared to the original paper.

How It Works

CrossFlow utilizes a diffusion model architecture, supporting both DiT and the state-of-the-art DiMR. It processes text prompts through language models like CLIP or T5-XXL, generating images by evolving latent representations. This approach aims for a noise-free generation process, enabling smooth interpolations and arithmetic operations in the latent space for creative image manipulation.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using pip3 install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121, pip3 install -U --pre triton, and pip3 install -r requirements.txt.
  • Prerequisites: PyTorch 2.1.2, CUDA 12.1. Requires downloading Stable Diffusion VAE and reference statistics.
  • Resources: Pretrained models are available for download. Training requires significant computational resources.
  • Links: project page, huggingface demo, paper, arxiv.

Highlighted Details

  • Supports both DiT and DiMR architectures.
  • Offers T5-XXL language model integration alongside CLIP.
  • Trained on open-source datasets (LAION-400M, JourneyDB) instead of proprietary data.
  • Enables latent space interpolation and arithmetic operations for image manipulation.
  • Provides pre-trained checkpoints for 256x256 and 512x512 resolutions.

Maintenance & Community

The project is associated with CVPR 2025 and lists Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, and Mannat Singh as contributors. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The project is created for research purposes. The specific license is not stated, but the research-focused nature may imply restrictions on commercial use.

Limitations & Caveats

T5-XXL models fine-tuned on JourneyDB may exhibit minor text-image misalignment compared to models trained from scratch. Linear interpolation sampling is currently limited to a single GPU.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Research Engineer at Mistral; Author of Hugging Face Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
4 more.

consistency_models by openai

0.1%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Patrick von Platen Patrick von Platen(Research Engineer at Mistral; Author of Hugging Face Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.