stable-diffusion-aesthetic-gradients by vicgalle

Personalization technique for Stable Diffusion research paper

Created 3 years ago

738 stars

Top 47.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

This repository provides a method for personalizing Stable Diffusion text-to-image generation by guiding the process towards custom aesthetics defined by user-provided image sets. It targets users seeking to influence image style without extensive prompt engineering or model retraining, offering a way to imbue generated images with specific visual characteristics.

How It Works

The core innovation is "aesthetic gradients," a technique that optimizes the diffusion model's latent space based on an "aesthetic embedding." This embedding is derived from a collection of images representing the desired aesthetic. By adjusting parameters like aesthetic_steps and aesthetic_lr, users can control the degree to which the generation process aligns with this learned aesthetic, effectively steering the output towards a specific visual style.

Quick Start & Requirements

Install via pip install -e . (requires a fork of the original Stable Diffusion repository).
Prerequisites are identical to the original Stable Diffusion repository.
Usage involves scripts/txt2img.py with additional arguments: --aesthetic_steps, --aesthetic_lr, and --aesthetic_embedding.
Pre-computed aesthetic embeddings are available in the aesthetic_embeddings directory.
A script scripts/gen_aesthetic_embedding.py is provided for generating custom embeddings.

Highlighted Details

Enables style personalization without retraining or high computational resources.
Demonstrates effectiveness with various Stable Diffusion models and datasets.
Allows mixing prompt-based styles with aesthetic embedding styles.
Includes pre-computed embeddings for styles like "fantasy," "flower_plant," and artist-specific aesthetics.

Maintenance & Community

The project is associated with the paper "Personalizing Text-to-Image Generation via Aesthetic Gradients" by Victor Gallego. Further resources and examples are linked in the README, including blog posts and discussions on platforms like Bilibili and Zhihu. A pull request for integration into the AUTOMATIC1111 Stable Diffusion Web UI is mentioned.

Licensing & Compatibility

The repository is a fork of the original Stable Diffusion, implying it inherits its licensing. The README does not explicitly state a license for this specific fork, but Stable Diffusion itself is typically released under permissive licenses allowing commercial use.

Limitations & Caveats

The effectiveness of personalization is dependent on the quality and representativeness of the user-provided images for generating custom aesthetic embeddings. The README notes that simply adding an artist's name to the prompt may not yield the same results as using a dedicated aesthetic embedding.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days