LinFusion by Huage001

PyTorch/Diffusers code for fast, high-res image generation

Created 1 year ago

312 stars

Top 86.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

LinFusion provides an efficient PyTorch and Diffusers implementation for generating ultra-high-resolution images, up to 16K, with reduced VRAM requirements. It targets researchers and users of diffusion models seeking to overcome resolution limitations and generation speed bottlenecks, enabling high-fidelity image creation on single GPUs.

How It Works

LinFusion integrates with existing diffusion pipelines (SD v1.5, v2.1, SDXL) by modifying their forward passes. It leverages techniques inspired by SDEdit and DemoFusion, allowing high-resolution generation by progressively upscaling from lower-resolution latents, rather than requiring full denoising at each high-resolution step. This approach reuses latents and incorporates dilated convolutions to manage computational complexity and memory usage.

Quick Start & Requirements

Install PyTorch and diffusers.
Clone the repository: git clone https://github.com/Huage001/LinFusion.git
Basic usage involves importing LinFusion and calling LinFusion.construct_for(pipeline).
Requires a CUDA-enabled GPU. High-resolution generation (e.g., 16K) may require significant VRAM (80GB), though optimized versions for 24GB are available.
Examples and Gradio demos are provided: examples/inference/basic_usage.ipynb, Gradio demo for SD-v1.5

Highlighted Details

Achieves 16K image generation with as little as 24GB VRAM through memory optimization techniques like chunked inference and CPU caching.
Supports integration with DistriFusion for further acceleration across multiple GPUs.
Offers training code for SD v1.5, v2.1, and SDXL, requiring substantial disk space (~75GB) for the training dataset.
Evaluation code is provided for FID and CLIP text cosine similarity metrics.

Maintenance & Community

The project has seen recent updates, including Triton implementation for improved efficiency and integration with DistriFusion. The authors are actively working on further integrations and welcome pull requests. Links to community resources are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is presented as an official implementation, and its use for commercial purposes or linking with closed-source projects would require clarification on licensing terms.

Limitations & Caveats

While the project enables high-resolution generation, directly applying low-resolution trained models can lead to content distortion or duplication, which LinFusion aims to mitigate. The README notes that 16K generation examples may require 80GB VRAM, with 24GB versions being an optimization. A local Gradio interface is still under development.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days