dalle-flow by jina-ai

Text-to-image generation with human-in-the-loop refinement

Created 3 years ago

2,834 stars

Top 16.7% on SourcePulse

2 Experts Love This Project

hammer

Jeff Hammerbacher

Cofounder of Cloudera

casper-hansen

Author of AutoAWQ

Project Summary

This project provides a human-in-the-loop workflow for generating high-definition images from text prompts, targeting creative professionals and developers. It leverages multiple text-to-image models and a CLIP-based ranking system to offer an iterative image creation process, enhancing creative control and output quality.

How It Works

The workflow chains together several AI models: DALL·E-Mega, GLID-3 XL, and Stable Diffusion generate initial image candidates. CLIP-as-service then ranks these candidates based on their relevance to the text prompt. The top-ranked image is further refined by GLID-3 XL for enhanced texture and background, and finally upscaled to 1024x1024 using SwinIR. This multi-stage approach, built on the Jina framework, allows for scalability and client-server interaction via gRPC/Websocket/HTTP.

Quick Start & Requirements

Install: pip install "docarray[common]>=0.13.5" jina
Prerequisites: Python 3.x, GPU with at least 21GB VRAM recommended for full functionality. Stable Diffusion requires agreeing to its ToS and downloading weights.
Demo Server: server_url = 'grpcs://dalle-flow.dev.jina.ai'
Docs: Client Usage

Highlighted Details

Supports DALL·E-Mega, GLID-3 XL, and Stable Diffusion for initial image generation.
Utilizes CLIP-as-service for prompt-based image ranking and selection.
Employs SwinIR for 1024x1024 upscaling.
Built with Jina for a scalable client-server architecture.
Offers a human-in-the-loop approach for iterative creative refinement.

Maintenance & Community

Actively developed by Jina AI.
Community support via Discord.
Regular updates and feature additions (e.g., RealESRGAN, CLIPseg).

Licensing & Compatibility

Licensed under Apache-2.0.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

Running the full workflow requires significant GPU VRAM (21GB+).
CPU-only operation is not supported.
The demo server may experience delays due to high demand.

Health Check

Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

ComfyUI-StableDiffusion3-API by ZHO-ZHO-ZHO

ComfyUI extension for Stable Diffusion 3 API access

Created 1 year ago

Updated 1 year ago

diffusion-client by AllenTom

A powerful Android client for Stable Diffusion WebUI

Created 2 years ago

Updated 1 year ago

auto-sd-paint-ext by Interpause

A webUI extension for streamlined Stable Diffusion workflows with Krita

Created 3 years ago

Updated 2 years ago

ComfyUI-TiledDiffusion by shiimizu

ComfyUI extension for large image generation and upscaling

Created 2 years ago

Updated 9 months ago

BizyAir by siliconflow

ComfyUI node collection for overcoming environment/hardware limits

Created 1 year ago

Updated 3 months ago

infinite-zoom-automatic1111-webui by v8hid

Extension for Stable Diffusion WebUI

Created 2 years ago

Updated 1 year ago

ComfyUI_UltimateSDUpscale by ssitu

ComfyUI nodes for Stable Diffusion upscaling

Created 2 years ago

Updated 20 hours ago

Real-Time-Latent-Consistency-Model by radames

App for real-time diffusion model pipelines using Diffusers

Created 2 years ago

Updated 3 months ago

aidea-server by mylxsw

Backend server for a multi-modal AI app

Created 2 years ago

Updated 5 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai).

clarity-upscaler by philz1337x

AI image upscaler & enhancer, alternative to Magnific

Created 1 year ago

Updated 10 months ago

Auto-Photoshop-StableDiffusion-Plugin by AbdullahAlfaraj

Photoshop plugin for Stable Diffusion image generation

Created 3 years ago

Updated 1 year ago

ComfyUI-Workflows-ZHO by ZHO-ZHO-ZHO

ComfyUI workflows collection

Created 1 year ago

Updated 1 year ago

Feedback? Help us improve.