X-Omni by X-Omni-Team

Unified discrete autoregressive model for image and language generation

Created 4 months ago

392 stars

Top 73.2% on SourcePulse

Project Summary

X-Omni provides official inference code and the LongText-Bench benchmark for a unified discrete autoregressive model capable of generating images from text prompts across English and Chinese. It is designed for researchers and practitioners interested in multimodal generative AI, offering superior instruction following and text rendering capabilities in generated images.

How It Works

X-Omni employs a discrete autoregressive modeling approach, unifying image and language generation within a single framework. This method allows for precise control over text rendering within images and supports arbitrary output resolutions. The model leverages reinforcement learning to enhance its performance, particularly in handling complex instructions and generating aesthetically pleasing outputs.

Quick Start & Requirements

Installation: Requires Python 3.12 and uses Conda for environment management (conda create -n xomni python==3.12, conda activate xomni). Install dependencies via pip install -r requirements.txt and pip install flash-attn --no-build-isolation.
Prerequisites: CUDA 12 is recommended for flash-attn.
Inference: Examples provided for English and Chinese image generation, and multi-modal chat. Requires downloading FLUX.1-dev model weights.
LongText-Bench: Requires transformers==4.52.0 and qwen_vl_utils. Evaluation uses a distributed script.
Links: Project Page, Paper, Model, Space, LongText-Bench.

Highlighted Details

Unified discrete autoregressive model for image and language.
Superior instruction following and text rendering (English/Chinese).
Generates images at arbitrary resolutions.
Includes the LongText-Bench benchmark for evaluation.

Maintenance & Community

The project is associated with Tencent Hunyuan X Team. Contact information for Yibing Wang and Xiaosong Zhang is provided for inquiries and collaboration.

Licensing & Compatibility

The repository does not explicitly state a license. The model weights are available on Hugging Face. Compatibility for commercial use or closed-source linking is not specified.

X-Omni by X-Omni-Team

Explore Similar Projects

MAGIC by yxuansu

UltraPixel by catcathh

SkyPaint-AI-Diffusion by SkyWorkAIGC

diffusion-self-distillation by primecai

HunyuanImage-2.1 by Tencent-Hunyuan

Lumina-mGPT-2.0 by Alpha-VLLM

long_stable_diffusion by sharonzhou

ELLA by TencentQQGYLab

HunyuanImage-3.0 by Tencent-Hunyuan

Qwen-Image by QwenLM

glide-text2im by openai

IF by deep-floyd