AutoStudio by donahowe

Training-free framework for multi-turn interactive image generation

created 1 year ago

446 stars

Top 68.4% on sourcepulse

Project Summary

AutoStudio addresses the challenge of maintaining subject consistency in multi-turn interactive image generation, a task where users iteratively refine image content. It targets researchers and developers working with text-to-image models, offering a framework to generate coherent image sequences with consistent subjects across multiple interactions.

How It Works

AutoStudio employs a novel training-free, multi-agent framework leveraging LLMs and Stable Diffusion. It comprises a subject manager for dialogue interpretation and context tracking, a layout generator for precise subject placement via bounding boxes, a supervisor for refinement suggestions, and a drawer for image synthesis. A key innovation is the Parallel-UNet, which enhances subject-aware feature exploitation through parallel cross-attention modules, alongside a subject-initialized generation method to preserve smaller subjects.

Quick Start & Requirements

Install via python run.py.
Prerequisites: Pretrained Stable Diffusion checkpoints (e.g., dreamlike-art/dreamlike-anime-1.0), DETECT_SAMefficient_sam_s_gpu.jit, and DETECT_SAM/Grounding-DINO/groundingdino_swint_ogc.pth.
Official Project Page: https://howe183.github.io/AutoStudio.io/
Paper: https://arxiv.org/abs/2406.01388

Highlighted Details

Achieves state-of-the-art performance with a 13.65% improvement in average Fréchet Inception Distance and 2.83% in average character-character similarity on the CMIGBench benchmark.
Supports both SDv1.5 and SDXL versions.
Training-free approach simplifies integration.
Focuses on multi-subject consistency across interactive turns.

Maintenance & Community

The project is actively maintained by undergraduate student Junhao Cheng, who is seeking PhD opportunities. The repository reached 200 stars in June 2024, with SDXL and SDv1.5 code released. Contact is available via email at howe4884@outlook.com.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a research artifact, and its stability and long-term maintenance are dependent on the author's academic pursuits. Specific hardware requirements for the mentioned checkpoints (e.g., GPU, CUDA) are not detailed.

Health Check

Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 90 days