japanese-stable-diffusion  by rinnakk

Text-to-image diffusion model for Japanese language and culture

Created 3 years ago
281 stars

Top 92.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Japanese Stable Diffusion, a latent text-to-image diffusion model specifically trained to understand Japanese language, culture, and nuances. It aims to generate higher quality, culturally relevant images for Japanese users compared to general-purpose models trained primarily on English data.

How It Works

The model is based on the Stable Diffusion architecture but undergoes a two-stage training process. First, a Japanese-specific text encoder is trained from scratch with a fixed latent diffusion model, mapping Japanese captions to the existing latent space. Second, the text encoder and latent diffusion model are jointly fine-tuned, enabling the generation of Japanese-style images and improved understanding of Japanese slang, onomatopoeia, and proper nouns.

Quick Start & Requirements

Highlighted Details

  • Generates Japanese-style images.
  • Understands Japanese slang, onomatopoeia, and proper nouns.
  • Includes a safety checker and invisible watermarking.
  • Modified 🤗 Diffusers library for integration.

Maintenance & Community

  • Initial release September 2022.
  • Project contributors: Makoto Shing, Kei Sawada.

Licensing & Compatibility

  • License: CreativeML OpenRAIL-M. This license allows for commercial use but may have restrictions related to responsible AI practices.

Limitations & Caveats

The model is not a fully fine-tuned version of Stable Diffusion due to the original CLIP tokenizer being English-centric. While improved for Japanese, performance may still be limited by the underlying English-trained components.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 4 years ago
Updated 3 years ago
Feedback? Help us improve.