japanese-stable-diffusion by rinnakk

Text-to-image diffusion model for Japanese language and culture

Created 3 years ago

282 stars

Top 92.6% on SourcePulse

Project Summary

This repository provides Japanese Stable Diffusion, a latent text-to-image diffusion model specifically trained to understand Japanese language, culture, and nuances. It aims to generate higher quality, culturally relevant images for Japanese users compared to general-purpose models trained primarily on English data.

How It Works

The model is based on the Stable Diffusion architecture but undergoes a two-stage training process. First, a Japanese-specific text encoder is trained from scratch with a fixed latent diffusion model, mapping Japanese captions to the existing latent space. Second, the text encoder and latent diffusion model are jointly fine-tuned, enabling the generation of Japanese-style images and improved understanding of Japanese slang, onomatopoeia, and proper nouns.

Quick Start & Requirements

Install via pip: pip install git+https://github.com/rinnakk/japanese-stable-diffusion
Requires Hugging Face Hub account and access token for model weights.
GPU with CUDA is recommended for inference.
Official Web Demo: https://huggingface.co/spaces/rinna/japanese-stable-diffusion
Model Weights: https://huggingface.co/rinna/japanese-stable-diffusion

Highlighted Details

Generates Japanese-style images.
Understands Japanese slang, onomatopoeia, and proper nouns.
Includes a safety checker and invisible watermarking.
Modified 🤗 Diffusers library for integration.

Maintenance & Community

Initial release September 2022.
Project contributors: Makoto Shing, Kei Sawada.

Licensing & Compatibility

License: CreativeML OpenRAIL-M. This license allows for commercial use but may have restrictions related to responsible AI practices.

Limitations & Caveats

The model is not a fully fine-tuned version of Stable Diffusion due to the original CLIP tokenizer being English-centric. While improved for Japanese, performance may still be limited by the underlying English-trained components.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days