ru-dalle  by ai-forever

Text-to-image generation in Russian

created 3 years ago
1,648 stars

Top 26.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides ru-dalle, a Python library for generating images from text prompts, specifically tailored for Russian language input. It targets researchers and developers interested in text-to-image synthesis with a focus on Russian language models, offering capabilities for image generation, cherry-picking via CLIP, and super-resolution.

How It Works

The library leverages a diffusion model architecture, likely a variant of DALL-E, for image generation. It utilizes a VAE (Variational Autoencoder) for decoding latent representations into images, with an option for DWT (Discrete Wavelet Transform) for potentially higher quality outputs. The integration of ruCLIP allows for semantic understanding of prompts and facilitates image selection based on relevance.

Quick Start & Requirements

  • Install: pip install rudalle==1.1.3
  • Requirements: CUDA-enabled GPU (3.5GB VRAM minimum for Malevich XL), Python.
  • Links: Hugging Face Models

Highlighted Details

  • Supports multiple model variants: Malevich (XL), Emojich (XL), Surrealist (XL), and Kandinsky (XXL) (upcoming).
  • Includes pipelines for image generation, cherry-picking with ruCLIP, and super-resolution using Real-ESRGAN.
  • Demonstrates finetuning capabilities and image prompt usage.
  • Claims FID of 15.4 (COCO Valid) for "robots in watercolor in the style of van Gogh".

Maintenance & Community

  • Active contributors acknowledged for significant contributions to decoding, inference speed, super-resolution, image prompts, and Colab integration.
  • Integrated with Huggingface Spaces via Gradio.

Licensing & Compatibility

  • License information is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. Kandinsky XXL is listed as "soon," indicating it's not yet available.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
7 more.

dalle-mini by borisdayma

0.1%
15k
Text-to-image model for generating images from text prompts
created 4 years ago
updated 1 year ago
Feedback? Help us improve.