Kolors  by Kwai-Kolors

Text-to-image model for photorealistic synthesis, trained on billions of pairs

created 1 year ago
4,502 stars

Top 11.1% on sourcepulse

GitHubView on GitHub
Project Summary

Kolors is a large-scale, bilingual (Chinese/English) text-to-image diffusion model trained on billions of text-image pairs. It aims to provide photorealistic image generation with superior visual quality, semantic accuracy, and text rendering capabilities compared to existing models, targeting researchers and developers in generative AI.

How It Works

Kolors is built upon latent diffusion, a powerful generative modeling technique. Its advantage lies in its massive training dataset and bilingual support, enabling it to understand and generate complex scenes and text accurately in both Chinese and English. The model has been extended with various control mechanisms like IP-Adapter, ControlNet (Canny, Depth, Pose), and inpainting capabilities, offering fine-grained control over the generation process.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment, and install requirements (pip install -r requirements.txt).
  • Dependencies: Python 3.8+, PyTorch 1.13.1+, Transformers 4.26.1+. CUDA 11.7+ recommended.
  • Weights: Download from Hugging Face (huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors).
  • Inference: Run python3 scripts/sample.py "your prompt".
  • Diffusers Integration: Available via Kwai-Kolors/Kolors-diffusers on Hugging Face.
  • Docs: Technical Report, Usage Examples.

Highlighted Details

  • Achieved top rankings in human and machine assessments (MPS) against models like DALL-E 3 and Midjourney v6.
  • Supports advanced features including IP-Adapter for image conditioning, ControlNet for structural guidance (Canny, Depth, Pose), and inpainting.
  • Integrates with popular platforms like Hugging Face Diffusers and ComfyUI.
  • Offers Dreambooth-LoRA for custom model fine-tuning.

Maintenance & Community

The project is actively maintained by the Kuaishou Kolors team, with frequent updates and releases of new features and control modules. Community engagement is encouraged via WeChat groups and email contact.

Licensing & Compatibility

The code is licensed under Apache-2.0. Model weights are open for academic research. Commercial use requires registration and potential licensing from the licensor, with specific terms based on monthly active users (300 million threshold). Usage is restricted for purposes harmful to the country and society.

Limitations & Caveats

While robust, the model's output is probabilistic and cannot be guaranteed for absolute accuracy or safety. Users are cautioned against misuse, abuse, or improper utilization, as the project disclaims legal responsibility for resulting issues.

Health Check
Last commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
145 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.