Text-to-image diffusion model for Japanese language and culture
Top 93.4% on sourcepulse
This repository provides Japanese Stable Diffusion, a latent text-to-image diffusion model specifically trained to understand Japanese language, culture, and nuances. It aims to generate higher quality, culturally relevant images for Japanese users compared to general-purpose models trained primarily on English data.
How It Works
The model is based on the Stable Diffusion architecture but undergoes a two-stage training process. First, a Japanese-specific text encoder is trained from scratch with a fixed latent diffusion model, mapping Japanese captions to the existing latent space. Second, the text encoder and latent diffusion model are jointly fine-tuned, enabling the generation of Japanese-style images and improved understanding of Japanese slang, onomatopoeia, and proper nouns.
Quick Start & Requirements
pip install git+https://github.com/rinnakk/japanese-stable-diffusion
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model is not a fully fine-tuned version of Stable Diffusion due to the original CLIP tokenizer being English-centric. While improved for Japanese, performance may still be limited by the underlying English-trained components.
2 years ago
1+ week