japanese-stable-diffusion  by rinnakk

Text-to-image diffusion model for Japanese language and culture

created 2 years ago
283 stars

Top 93.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Japanese Stable Diffusion, a latent text-to-image diffusion model specifically trained to understand Japanese language, culture, and nuances. It aims to generate higher quality, culturally relevant images for Japanese users compared to general-purpose models trained primarily on English data.

How It Works

The model is based on the Stable Diffusion architecture but undergoes a two-stage training process. First, a Japanese-specific text encoder is trained from scratch with a fixed latent diffusion model, mapping Japanese captions to the existing latent space. Second, the text encoder and latent diffusion model are jointly fine-tuned, enabling the generation of Japanese-style images and improved understanding of Japanese slang, onomatopoeia, and proper nouns.

Quick Start & Requirements

Highlighted Details

  • Generates Japanese-style images.
  • Understands Japanese slang, onomatopoeia, and proper nouns.
  • Includes a safety checker and invisible watermarking.
  • Modified 🤗 Diffusers library for integration.

Maintenance & Community

  • Initial release September 2022.
  • Project contributors: Makoto Shing, Kei Sawada.

Licensing & Compatibility

  • License: CreativeML OpenRAIL-M. This license allows for commercial use but may have restrictions related to responsible AI practices.

Limitations & Caveats

The model is not a fully fine-tuned version of Stable Diffusion due to the original CLIP tokenizer being English-centric. While improved for Japanese, performance may still be limited by the underlying English-trained components.

Health Check
Last commit

2 years ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.