RAG-Diffusion  by NJU-PCALab

Regional-aware text-to-image generation research paper

Created 10 months ago
605 stars

Top 54.2% on SourcePulse

GitHubView on GitHub
Project Summary

RAG-Diffusion enables precise spatial control in text-to-image generation by decomposing the process into "Regional Hard Binding" for individual object placement and "Regional Soft Refinement" for detail blending. This approach targets users needing fine-grained layout composition, offering tuning-free integration and novel repainting capabilities without external inpainting models.

How It Works

RAG-Diffusion employs a two-stage generation process. "Regional Hard Binding" uses offset and scale parameters to precisely position and size specified objects based on prompts. "Regional Soft Refinement" then blends these regions, smoothing boundaries and enhancing inter-region details. This decoupling allows for independent control over object placement and overall image coherence, improving upon methods that rely solely on attention map manipulation.

Quick Start & Requirements

  • Install: conda create -n RAG python==3.9, conda activate RAG, pip install xformers==0.0.28.post1 diffusers peft torchvision==0.19.1 opencv-python==4.10.0.84 sentencepiece==0.2.0 protobuf==5.28.1 scipy==1.13.1
  • Prerequisites: CUDA-enabled GPU.
  • Demo: Online Demo
  • Code: GitHub Repository

Highlighted Details

  • Supports FLUX.1-dev, FLUX.1 Redux, PuLID, and IP-Adapter integrations.
  • Enables image repainting by modifying specific regions with a mask.
  • Offers LoRA integration for style and content customization.
  • Can leverage MLLMs to automatically parse prompts into regional parameters.

Maintenance & Community

  • Project is actively updated with support for new models like FLUX.1 Redux and FLUX.1-dev-IP-Adapter.
  • Contact: znchen@smail.nju.edu.cn

Licensing & Compatibility

  • The repository does not explicitly state a license. Code examples use libraries with various licenses (e.g., Hugging Face Diffusers).

Limitations & Caveats

  • The absence of a specified license may pose compatibility issues for commercial or closed-source projects.
  • Precise control parameters (offsets, scales) require careful tuning for optimal results.
Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
7 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.