karlo by kakaobrain

Text-to-image model based on unCLIP architecture

Created 3 years ago

698 stars

Top 48.9% on SourcePulse

View on GitHub

6 Experts Love This Project

Bojan Tunguz

AI Scientist; Formerly at NVIDIA

Elvis Saravia

Founder of DAIR.AI

Chuan Li

Chief Scientific Officer at Lambda

Luis Capelo

Cofounder of Lightning AI

and 2 more!

Project Summary

Karlo is a text-conditional image generation model that addresses the challenge of producing high-quality images from text prompts with improved detail recovery in fewer denoising steps. It is based on OpenAI's unCLIP architecture and is suitable for researchers and developers interested in advanced diffusion models.

How It Works

Karlo utilizes an unCLIP architecture comprising prior, decoder, and super-resolution modules. It features an enhanced super-resolution module that upscales images from 64px to 256px in just 7 reverse steps. This is achieved by first using a DDPM-trained SR module for initial upscaling and then a VQ-GAN-style loss fine-tuned module for recovering high-frequency details, offering an efficient approach to detail enhancement.

Quick Start & Requirements

Install: pip install diffusers transformers accelerate safetensors
Prerequisites: PyTorch >= 1.10, CUDA >= 11. A single V100 with 32GB VRAM is recommended for sampling.
Model Weights: Download required checkpoints via wget commands or setup.sh.
Demo: Launch a Gradio demo with python demo/product_demo.py.
Docs: Diffusers unCLIP Pipeline Docs

Highlighted Details

Trained on 115M image-text pairs (COYO-100M, CC3M, CC12M).
Achieves CLIP-score of 0.3081 and FID of 14.37 on CC3M validation set with 25 decoder steps.
Uses ViT-L/14 from CLIP for prior and decoder, with a modified text encoder for efficiency.
Integrated into Hugging Face's diffusers library.

Maintenance & Community

Released as Karlo-v1.0.alpha on 2022-12-01.
Integrated into diffusers and Huggingface Spaces.
Contact: contact@kakaobrain.com for collaboration or feedback.

Licensing & Compatibility

License: CreativeML Open RAIL-M.
Commercial use is permitted, but a robust safe checker is recommended.

Limitations & Caveats

This is an alpha version.
The README notes that the second run in the Gradio demo can be unexpectedly slower due to CUDA kernel launch times, potentially up to 2 minutes.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days