SkyPaint-AI-Diffusion  by SkyWorkAIGC

Text-to-image model optimized from Stable Diffusion

created 2 years ago
656 stars

Top 51.9% on sourcepulse

GitHubView on GitHub
Project Summary

SkyPaint-AI-Diffusion offers an optimized text-to-image generation model based on Stable Diffusion, capable of producing high-quality images in modern art styles from both Chinese and English text prompts. It is targeted at users seeking advanced AI art generation with bilingual input capabilities.

How It Works

The project comprises two main components: an optimized text encoder and a diffusion model. The text encoder, SkyCLIP, is a distilled, bilingual (Chinese/English) CLIP model trained efficiently using text data. This approach significantly reduces data and computational requirements for reproduction and fine-tuning. The diffusion model is fine-tuned from stable-diffusion-v1.5, with prompts augmented by the tag 'sai-v1 art' to guide the learning of specific styles and quality.

Quick Start & Requirements

  • Install via diffusers library.
  • Requires a CUDA-enabled GPU (training utilized 16x A100s).
  • Example usage provided in the README.

Highlighted Details

  • Supports Chinese, English, and mixed-language prompts.
  • Generates images in modern art styles.
  • Compatible with stable_diffusion_1.x official models and fine-tuned variants.
  • SkyCLIP model offers efficient bilingual CLIP training and evaluation on Flickr30K-CN.

Maintenance & Community

  • Project is under continuous development and optimization.
  • WeChat QR code provided for joining a developer group.

Licensing & Compatibility

  • License: CreativeML Open RAIL-M.
  • This license permits commercial use but may have specific use-case restrictions.

Limitations & Caveats

The model is still under continuous optimization, with the expectation of more stable updates in the future.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
2 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
created 3 years ago
updated 1 year ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.