RAG-Diffusion by NJU-PCALab

Regional-aware text-to-image generation research paper

Created 1 year ago

620 stars

Top 53.3% on SourcePulse

Project Summary

RAG-Diffusion enables precise spatial control in text-to-image generation by decomposing the process into "Regional Hard Binding" for individual object placement and "Regional Soft Refinement" for detail blending. This approach targets users needing fine-grained layout composition, offering tuning-free integration and novel repainting capabilities without external inpainting models.

How It Works

RAG-Diffusion employs a two-stage generation process. "Regional Hard Binding" uses offset and scale parameters to precisely position and size specified objects based on prompts. "Regional Soft Refinement" then blends these regions, smoothing boundaries and enhancing inter-region details. This decoupling allows for independent control over object placement and overall image coherence, improving upon methods that rely solely on attention map manipulation.

Quick Start & Requirements

Install: conda create -n RAG python==3.9, conda activate RAG, pip install xformers==0.0.28.post1 diffusers peft torchvision==0.19.1 opencv-python==4.10.0.84 sentencepiece==0.2.0 protobuf==5.28.1 scipy==1.13.1
Prerequisites: CUDA-enabled GPU.
Demo: Online Demo
Code: GitHub Repository

Highlighted Details

Supports FLUX.1-dev, FLUX.1 Redux, PuLID, and IP-Adapter integrations.
Enables image repainting by modifying specific regions with a mask.
Offers LoRA integration for style and content customization.
Can leverage MLLMs to automatically parse prompts into regional parameters.

Maintenance & Community

Project is actively updated with support for new models like FLUX.1 Redux and FLUX.1-dev-IP-Adapter.
Contact: znchen@smail.nju.edu.cn

Licensing & Compatibility

The repository does not explicitly state a license. Code examples use libraries with various licenses (e.g., Hugging Face Diffusers).

Limitations & Caveats

The absence of a specified license may pose compatibility issues for commercial or closed-source projects.
Precise control parameters (offsets, scales) require careful tuning for optimal results.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days