Discover and explore top open-source AI tools and projects—updated daily.
Anonymous Region Transformer for multi-layer image generation
Top 81.0% on SourcePulse
This repository provides official code for ART (Anonymous Region Transformer), a method for generating multi-layer transparent images from a single global text prompt and an anonymous region layout. It targets researchers and artists interested in complex image composition and transparent media generation, offering a novel approach to layer-based image synthesis.
How It Works
ART utilizes a transformer architecture to process an anonymous region layout (bounding boxes without explicit layer descriptions) and a global text prompt. This approach avoids the need for per-layer captions, simplifying the input process. The system is designed for efficiency, outperforming full attention and spatial-temporal attention mechanisms, and supports the generation of over 50 layers.
Quick Start & Requirements
pip3 install torch==2.4.0 torchvision==0.19.0 diffusers==0.31.0 transformers==4.44.0 accelerate==0.34.2 peft==0.12.0 datasets==2.20.0 wandb==0.17.7 einops==0.8.0 sentencepiece==0.2.0 mmengine==0.10.4 prodigyopt==1.0
.python example.py
or python multi_layer_gen/test.py
with specified arguments.pip install -r requirements_part1.txt
and requirements_part2.txt
within the layout_planner
directory.ffmpeg
, libsm6
, libxext6
and potentially flash-attn-2
.scripts/inference_template.sh
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
1 day