Text-to-image generation with human-in-the-loop refinement
Top 17.1% on sourcepulse
This project provides a human-in-the-loop workflow for generating high-definition images from text prompts, targeting creative professionals and developers. It leverages multiple text-to-image models and a CLIP-based ranking system to offer an iterative image creation process, enhancing creative control and output quality.
How It Works
The workflow chains together several AI models: DALL·E-Mega, GLID-3 XL, and Stable Diffusion generate initial image candidates. CLIP-as-service then ranks these candidates based on their relevance to the text prompt. The top-ranked image is further refined by GLID-3 XL for enhanced texture and background, and finally upscaled to 1024x1024 using SwinIR. This multi-stage approach, built on the Jina framework, allows for scalability and client-server interaction via gRPC/Websocket/HTTP.
Quick Start & Requirements
pip install "docarray[common]>=0.13.5" jina
server_url = 'grpcs://dalle-flow.dev.jina.ai'
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 years ago
Inactive