CogView4 by zai-org

Text-to-image generation system using cascading diffusion

Created 1 year ago

1,102 stars

Top 34.7% on SourcePulse

Project Summary

CogView4 is a suite of advanced text-to-image generation models, including CogView4 (6B parameters), CogView3-Plus (3B parameters), and CogView3, targeting researchers and developers in multimodal AI. It offers high-resolution image generation with native Chinese language support and competitive performance on various benchmarks.

How It Works

CogView4 utilizes a Diffusion Transformer architecture, while CogView3 employs a cascading diffusion approach with a relay diffusion framework. This allows for flexible generation across resolutions up to 2048x2048 and supports both Chinese and English prompts. The models leverage GLM-4-9B or T5-XXL encoders for prompt understanding.

Quick Start & Requirements

Install: pip install diffusers transformers accelerate
Prerequisites: PyTorch with CUDA support, Python 3.8+. BF16 precision is recommended for inference.
Memory: Minimum 13GB VRAM with CPU offloading and 4-bit text encoder, up to 39GB VRAM without offloading for higher resolutions. 32GB RAM recommended.
Links: HuggingFace, ModelScope, Diffusers Example

Highlighted Details

CogView4-6B achieves 85.13 on DPG-Bench Overall and 0.73 on GenEval Overall.
Supports resolutions from 512x512 up to 2048x2048, with aspect ratios divisible by 32.
Native Chinese prompt support and generation capabilities.
Offers CPU offloading and tiling for reduced GPU memory usage.

Maintenance & Community

Actively developed by THUDM. Recent updates include diffusers adaptation and the upcoming CogKit fine-tuning toolkit.
Community contributions are welcomed, with existing wrappers for ComfyUI.
WeChat Community

Licensing & Compatibility

Code and CogView3 models are licensed under Apache 2.0.
CogView4 model weights are available for research and commercial use, subject to THUDM's terms.

Limitations & Caveats

Fine-tuning code is not included in the main repository but is available via CogKit or finetrainers.
Prompt optimization using an LLM is strongly recommended for optimal generation quality.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

4 stars in the last 30 days

Explore Similar Projects

X-Omni by X-Omni-Team

Unified discrete autoregressive model for image and language generation

Created 5 months ago

Updated 4 months ago

LaVi-Bridge by ShihaoZhaoZSH

Text-to-image generation research paper

Created 1 year ago

Updated 1 year ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

1 more.

Wuerstchen by dome272

Text-to-image research paper on efficient diffusion model pretraining

Created 2 years ago

Updated 1 year ago

SkyPaint-AI-Diffusion by SkyWorkAIGC

Text-to-image model optimized from Stable Diffusion

Created 3 years ago

Updated 2 years ago

diffusion-self-distillation by primecai

Image generation research paper (CVPR 2025)

Created 11 months ago

Updated 9 months ago

kandinsky-5 by kandinskylab

Advanced diffusion models for versatile video and image generation

Created 5 months ago

Updated 1 week ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm), and

1 more.

long_stable_diffusion by sharonzhou

AI pipeline for long-form text-to-image generation

Created 3 years ago

Updated 3 years ago

text2image-gui by n00mkrad

GUI for Stable Diffusion text-to-image generation

Created 3 years ago

Updated 3 weeks ago

HunyuanDiT by Tencent-Hunyuan

Text-to-image diffusion transformer with Chinese understanding

Created 1 year ago

Updated 1 month ago

Starred by

Shengjia Zhao

Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab),

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab), and

7 more.

glide-text2im by openai

Text-conditional image synthesis model from research paper

Created 4 years ago

Updated 1 year ago

Starred by

Deepak Pathak

Deepak Pathak(Cofounder of Skild AI; Professor at CMU),

Travis Fischer

Travis Fischer(Founder of Agentic), and

8 more.

sygil-webui by Sygil-Dev

Web UI for Stable Diffusion

Created 3 years ago

Updated 1 month ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs), and

13 more.

latent-diffusion by CompVis

Image synthesis research paper using latent diffusion models

Created 4 years ago

Updated 1 year ago

Feedback? Help us improve.