Kolors by Kwai-Kolors

Text-to-image model for photorealistic synthesis, trained on billions of pairs

Created 1 year ago

4,601 stars

Top 10.6% on SourcePulse

Project Summary

Kolors is a large-scale, bilingual (Chinese/English) text-to-image diffusion model trained on billions of text-image pairs. It aims to provide photorealistic image generation with superior visual quality, semantic accuracy, and text rendering capabilities compared to existing models, targeting researchers and developers in generative AI.

How It Works

Kolors is built upon latent diffusion, a powerful generative modeling technique. Its advantage lies in its massive training dataset and bilingual support, enabling it to understand and generate complex scenes and text accurately in both Chinese and English. The model has been extended with various control mechanisms like IP-Adapter, ControlNet (Canny, Depth, Pose), and inpainting capabilities, offering fine-grained control over the generation process.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment, and install requirements (pip install -r requirements.txt).
Dependencies: Python 3.8+, PyTorch 1.13.1+, Transformers 4.26.1+. CUDA 11.7+ recommended.
Weights: Download from Hugging Face (huggingface-cli download --resume-download Kwai-Kolors/Kolors --local-dir weights/Kolors).
Inference: Run python3 scripts/sample.py "your prompt".
Diffusers Integration: Available via Kwai-Kolors/Kolors-diffusers on Hugging Face.
Docs: Technical Report, Usage Examples.

Highlighted Details

Achieved top rankings in human and machine assessments (MPS) against models like DALL-E 3 and Midjourney v6.
Supports advanced features including IP-Adapter for image conditioning, ControlNet for structural guidance (Canny, Depth, Pose), and inpainting.
Integrates with popular platforms like Hugging Face Diffusers and ComfyUI.
Offers Dreambooth-LoRA for custom model fine-tuning.

Maintenance & Community

The project is actively maintained by the Kuaishou Kolors team, with frequent updates and releases of new features and control modules. Community engagement is encouraged via WeChat groups and email contact.

Licensing & Compatibility

The code is licensed under Apache-2.0. Model weights are open for academic research. Commercial use requires registration and potential licensing from the licensor, with specific terms based on monthly active users (300 million threshold). Usage is restricted for purposes harmful to the country and society.

Limitations & Caveats

While robust, the model's output is probabilistic and cannot be guaranteed for absolute accuracy or safety. Users are cautioned against misuse, abuse, or improper utilization, as the project disclaims legal responsibility for resulting issues.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days