AutoStudio  by donahowe

Training-free framework for multi-turn interactive image generation

created 1 year ago
446 stars

Top 68.4% on sourcepulse

GitHubView on GitHub
Project Summary

AutoStudio addresses the challenge of maintaining subject consistency in multi-turn interactive image generation, a task where users iteratively refine image content. It targets researchers and developers working with text-to-image models, offering a framework to generate coherent image sequences with consistent subjects across multiple interactions.

How It Works

AutoStudio employs a novel training-free, multi-agent framework leveraging LLMs and Stable Diffusion. It comprises a subject manager for dialogue interpretation and context tracking, a layout generator for precise subject placement via bounding boxes, a supervisor for refinement suggestions, and a drawer for image synthesis. A key innovation is the Parallel-UNet, which enhances subject-aware feature exploitation through parallel cross-attention modules, alongside a subject-initialized generation method to preserve smaller subjects.

Quick Start & Requirements

Highlighted Details

  • Achieves state-of-the-art performance with a 13.65% improvement in average Fréchet Inception Distance and 2.83% in average character-character similarity on the CMIGBench benchmark.
  • Supports both SDv1.5 and SDXL versions.
  • Training-free approach simplifies integration.
  • Focuses on multi-subject consistency across interactive turns.

Maintenance & Community

The project is actively maintained by undergraduate student Junhao Cheng, who is seeking PhD opportunities. The repository reached 200 stars in June 2024, with SDXL and SDv1.5 code released. Contact is available via email at howe4884@outlook.com.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a research artifact, and its stability and long-term maintenance are dependent on the author's academic pursuits. Specific hardware requirements for the mentioned checkpoints (e.g., GPU, CUDA) are not detailed.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.