ru-dalle by ai-forever

Text-to-image generation in Russian

Created 4 years ago

1,646 stars

Top 25.2% on SourcePulse

2 Experts Love This Project

shizhediao

Author of LMFlow; Research Scientist at NVIDIA

borzunov

Alexander Borzunov

Research Scientist at OpenAI

Project Summary

This repository provides ru-dalle, a Python library for generating images from text prompts, specifically tailored for Russian language input. It targets researchers and developers interested in text-to-image synthesis with a focus on Russian language models, offering capabilities for image generation, cherry-picking via CLIP, and super-resolution.

How It Works

The library leverages a diffusion model architecture, likely a variant of DALL-E, for image generation. It utilizes a VAE (Variational Autoencoder) for decoding latent representations into images, with an option for DWT (Discrete Wavelet Transform) for potentially higher quality outputs. The integration of ruCLIP allows for semantic understanding of prompts and facilitates image selection based on relevance.

Quick Start & Requirements

Install: pip install rudalle==1.1.3
Requirements: CUDA-enabled GPU (3.5GB VRAM minimum for Malevich XL), Python.
Links: Hugging Face Models

Highlighted Details

Supports multiple model variants: Malevich (XL), Emojich (XL), Surrealist (XL), and Kandinsky (XXL) (upcoming).
Includes pipelines for image generation, cherry-picking with ruCLIP, and super-resolution using Real-ESRGAN.
Demonstrates finetuning capabilities and image prompt usage.
Claims FID of 15.4 (COCO Valid) for "robots in watercolor in the style of van Gogh".

Maintenance & Community

Active contributors acknowledged for significant contributions to decoding, inference speed, super-resolution, image prompts, and Colab integration.
Integrated with Huggingface Spaces via Gradio.

Licensing & Compatibility

License information is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. Kandinsky XXL is listed as "soon," indicating it's not yet available.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

X-Omni by X-Omni-Team

Unified discrete autoregressive model for image and language generation

Created 7 months ago

Updated 6 months ago

MAGIC by yxuansu

Framework for image-guided text generation using language models

Created 3 years ago

Updated 3 years ago

UltraPixel by catcathh

Research paper implementation for ultra-high-resolution image synthesis

Created 1 year ago

Updated 1 year ago

diffusion-self-distillation by primecai

Image generation research paper (CVPR 2025)

Created 1 year ago

Updated 11 months ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm), and

1 more.

long_stable_diffusion by sharonzhou

AI pipeline for long-form text-to-image generation

Created 3 years ago

Updated 3 years ago

Starred by

Max Howell

Max Howell(Author of Homebrew),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

1 more.

big-sleep by lucidrains

CLI tool for text-to-image generation

Created 5 years ago

Updated 4 years ago

Starred by

Phil Wang

Phil Wang(Prolific Research Paper Implementer),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

4 more.

deep-daze by lucidrains

CLI tool for text-to-image generation using CLIP and SIREN

Created 5 years ago

Updated 4 years ago

HunyuanDiT by Tencent-Hunyuan

Text-to-image diffusion transformer with Chinese understanding

Created 1 year ago

Updated 3 months ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

OmniGen by VectorSpaceLab

Image generation model for multimodal prompts

Created 1 year ago

Updated 2 months ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Shyamal Anadkat

Shyamal Anadkat(Research Scientist at OpenAI), and

4 more.

VQGAN-CLIP by nerdyrodent

Local VQGAN+CLIP tool for text-to-image generation

Created 4 years ago

Updated 3 years ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen) and

Lysandre Debut

Lysandre Debut(Chief Open-Source Officer at Hugging Face).

Qwen-Image by QwenLM

Image generation model with advanced text rendering

Created 6 months ago

Updated 2 weeks ago

Starred by

Shengjia Zhao

Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab),

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab), and

7 more.

glide-text2im by openai

Text-conditional image synthesis model from research paper

Created 4 years ago

Updated 1 year ago

Feedback? Help us improve.