dalle-flow  by jina-ai

Text-to-image generation with human-in-the-loop refinement

created 3 years ago
2,834 stars

Top 17.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a human-in-the-loop workflow for generating high-definition images from text prompts, targeting creative professionals and developers. It leverages multiple text-to-image models and a CLIP-based ranking system to offer an iterative image creation process, enhancing creative control and output quality.

How It Works

The workflow chains together several AI models: DALL·E-Mega, GLID-3 XL, and Stable Diffusion generate initial image candidates. CLIP-as-service then ranks these candidates based on their relevance to the text prompt. The top-ranked image is further refined by GLID-3 XL for enhanced texture and background, and finally upscaled to 1024x1024 using SwinIR. This multi-stage approach, built on the Jina framework, allows for scalability and client-server interaction via gRPC/Websocket/HTTP.

Quick Start & Requirements

  • Install: pip install "docarray[common]>=0.13.5" jina
  • Prerequisites: Python 3.x, GPU with at least 21GB VRAM recommended for full functionality. Stable Diffusion requires agreeing to its ToS and downloading weights.
  • Demo Server: server_url = 'grpcs://dalle-flow.dev.jina.ai'
  • Docs: Client Usage

Highlighted Details

  • Supports DALL·E-Mega, GLID-3 XL, and Stable Diffusion for initial image generation.
  • Utilizes CLIP-as-service for prompt-based image ranking and selection.
  • Employs SwinIR for 1024x1024 upscaling.
  • Built with Jina for a scalable client-server architecture.
  • Offers a human-in-the-loop approach for iterative creative refinement.

Maintenance & Community

  • Actively developed by Jina AI.
  • Community support via Discord.
  • Regular updates and feature additions (e.g., RealESRGAN, CLIPseg).

Licensing & Compatibility

  • Licensed under Apache-2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Running the full workflow requires significant GPU VRAM (21GB+).
  • CPU-only operation is not supported.
  • The demo server may experience delays due to high demand.
Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.