DemoFusion  by PRIS-CV

High-resolution image generation research paper

Created 1 year ago
2,027 stars

Top 22.0% on SourcePulse

GitHubView on GitHub
Project Summary

DemoFusion provides a framework for generating high-resolution images using existing Latent Diffusion Models (LDMs), aiming to democratize access to advanced AI image generation. It is designed for researchers and users interested in pushing the boundaries of image resolution without requiring extensive computational resources for training.

How It Works

DemoFusion extends LDMs with three core mechanisms: Progressive Upscaling, Skip Residual, and Dilated Sampling. This approach allows for higher-resolution outputs by iteratively refining the image. The progressive nature also enables rapid prompt iteration by providing intermediate "previews" during generation.

Quick Start & Requirements

Highlighted Details

  • Achieves high-resolution image generation (e.g., 3072x3072) by extending open-source LDMs.
  • Offers a low-VRAM version for users with limited GPU memory.
  • Supports both Text2Image and Image2Image functionalities.
  • Integrates with ControlNet for enhanced control over generation.

Maintenance & Community

The project was accepted to CVPR 2024. Community contributions have led to ComfyUI and Replicate integrations, as well as ControlNet and low-VRAM implementations.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The Image2Image functionality is noted to have a strong bias correlated with SDXL's training data. Default hyper-parameters are recommended but may not be optimal for all use cases.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
11 more.

IF by deep-floyd

0.0%
8k
Text-to-image model for photorealistic synthesis and language understanding
Created 2 years ago
Updated 1 year ago
Starred by Robin Huang Robin Huang(Cofounder of Comfy Org), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
17 more.

stablediffusion by Stability-AI

0.1%
42k
Latent diffusion model for high-resolution image synthesis
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.