ru-dalle  by ai-forever

Text-to-image generation in Russian

Created 3 years ago
1,649 stars

Top 25.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides ru-dalle, a Python library for generating images from text prompts, specifically tailored for Russian language input. It targets researchers and developers interested in text-to-image synthesis with a focus on Russian language models, offering capabilities for image generation, cherry-picking via CLIP, and super-resolution.

How It Works

The library leverages a diffusion model architecture, likely a variant of DALL-E, for image generation. It utilizes a VAE (Variational Autoencoder) for decoding latent representations into images, with an option for DWT (Discrete Wavelet Transform) for potentially higher quality outputs. The integration of ruCLIP allows for semantic understanding of prompts and facilitates image selection based on relevance.

Quick Start & Requirements

  • Install: pip install rudalle==1.1.3
  • Requirements: CUDA-enabled GPU (3.5GB VRAM minimum for Malevich XL), Python.
  • Links: Hugging Face Models

Highlighted Details

  • Supports multiple model variants: Malevich (XL), Emojich (XL), Surrealist (XL), and Kandinsky (XXL) (upcoming).
  • Includes pipelines for image generation, cherry-picking with ruCLIP, and super-resolution using Real-ESRGAN.
  • Demonstrates finetuning capabilities and image prompt usage.
  • Claims FID of 15.4 (COCO Valid) for "robots in watercolor in the style of van Gogh".

Maintenance & Community

  • Active contributors acknowledged for significant contributions to decoding, inference speed, super-resolution, image prompts, and Colab integration.
  • Integrated with Huggingface Spaces via Gradio.

Licensing & Compatibility

  • License information is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. Kandinsky XXL is listed as "soon," indicating it's not yet available.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Max Howell Max Howell(Author of Homebrew), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

big-sleep by lucidrains

0%
3k
CLI tool for text-to-image generation
Created 4 years ago
Updated 3 years ago
Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
7 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.