IF by deep-floyd

Text-to-image model for photorealistic synthesis and language understanding

Created 3 years ago

7,843 stars

Top 6.6% on SourcePulse

View on GitHub

14 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Chaoyu Yang

Founder of Bento

Pawel Garbacki

Cofounder of Fireworks AI

Jaret Burkett

Founder of Ostris

and 10 more!

Project Summary

DeepFloyd IF is a modular, cascaded diffusion model for high-fidelity text-to-image generation, targeting researchers and developers seeking state-of-the-art photorealism and language understanding. It offers a flexible architecture for various creative applications, including image generation, style transfer, super-resolution, and inpainting.

How It Works

IF employs a three-stage cascaded diffusion process. A frozen T5 text encoder generates embeddings, which are fed into a UNet-based base model (IF-I) producing 64x64 images. Two subsequent super-resolution diffusion models (IF-II and Stable x4 upscaler) progressively increase the resolution to 256x256 and 1024x1024, respectively. This cascaded approach, particularly the use of larger UNet architectures in the initial stage, is key to achieving high photorealism and detailed outputs.

Quick Start & Requirements

Install: pip install deepfloyd_if==1.0.2rc0 xformers==0.0.16 git+https://github.com/openai/CLIP.git --no-deps
Prerequisites: Hugging Face account, login via huggingface_hub, torch>=2.0.0 (with enable_xformers_memory_efficient_attention() removed), accelerate, transformers, safetensors.
VRAM: Minimum 16GB for IF-I-XL and IF-II-L; 24GB for all three stages (IF-I-XL, IF-II-L, Stable x4).
Docs: IF blog post, Diffusers integration

Highlighted Details

Achieves a zero-shot FID score of 6.66 on COCO.
Supports text-to-image, style transfer, super-resolution, and inpainting.
Integrates with Hugging Face Diffusers for customizable pipelines and CPU offloading for lower VRAM usage.
Parameter-efficient fine-tuning is supported for adding new concepts.

Maintenance & Community

Developed by DeepFloyd Lab at StabilityAI.
Key contributors include Alex Shonenkov, Misha Konstantinov, Daria Bakshandaeva, Christoph Schuhmann, Ksenia Ivanova, and Nadiia Klokova.
Significant contributions from external community members like @Apolinário and @patrickvonplaten are acknowledged.

Licensing & Compatibility

Code released under a "bespoke license" with an initial restricted research-purposes-only license for the model weights, with plans for a fully open-source release.
Compatibility for commercial use is not explicitly stated for the initial release.

Limitations & Caveats

The initial release of IF model weights is under a restricted research-purposes-only license. The model has known limitations and biases, which are detailed in the model card.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days