RAG-Diffusion  by NJU-PCALab

Regional-aware text-to-image generation research paper

created 8 months ago
592 stars

Top 55.7% on sourcepulse

GitHubView on GitHub
Project Summary

RAG-Diffusion enables precise spatial control in text-to-image generation by decomposing the process into "Regional Hard Binding" for individual object placement and "Regional Soft Refinement" for detail blending. This approach targets users needing fine-grained layout composition, offering tuning-free integration and novel repainting capabilities without external inpainting models.

How It Works

RAG-Diffusion employs a two-stage generation process. "Regional Hard Binding" uses offset and scale parameters to precisely position and size specified objects based on prompts. "Regional Soft Refinement" then blends these regions, smoothing boundaries and enhancing inter-region details. This decoupling allows for independent control over object placement and overall image coherence, improving upon methods that rely solely on attention map manipulation.

Quick Start & Requirements

  • Install: conda create -n RAG python==3.9, conda activate RAG, pip install xformers==0.0.28.post1 diffusers peft torchvision==0.19.1 opencv-python==4.10.0.84 sentencepiece==0.2.0 protobuf==5.28.1 scipy==1.13.1
  • Prerequisites: CUDA-enabled GPU.
  • Demo: Online Demo
  • Code: GitHub Repository

Highlighted Details

  • Supports FLUX.1-dev, FLUX.1 Redux, PuLID, and IP-Adapter integrations.
  • Enables image repainting by modifying specific regions with a mask.
  • Offers LoRA integration for style and content customization.
  • Can leverage MLLMs to automatically parse prompts into regional parameters.

Maintenance & Community

  • Project is actively updated with support for new models like FLUX.1 Redux and FLUX.1-dev-IP-Adapter.
  • Contact: znchen@smail.nju.edu.cn

Licensing & Compatibility

  • The repository does not explicitly state a license. Code examples use libraries with various licenses (e.g., Hugging Face Diffusers).

Limitations & Caveats

  • The absence of a specified license may pose compatibility issues for commercial or closed-source projects.
  • Precise control parameters (offsets, scales) require careful tuning for optimal results.
Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
29 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.1%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Feedback? Help us improve.