CogView4  by zai-org

Text-to-image generation system using cascading diffusion

Created 1 year ago
1,088 stars

Top 35.0% on SourcePulse

GitHubView on GitHub
Project Summary

CogView4 is a suite of advanced text-to-image generation models, including CogView4 (6B parameters), CogView3-Plus (3B parameters), and CogView3, targeting researchers and developers in multimodal AI. It offers high-resolution image generation with native Chinese language support and competitive performance on various benchmarks.

How It Works

CogView4 utilizes a Diffusion Transformer architecture, while CogView3 employs a cascading diffusion approach with a relay diffusion framework. This allows for flexible generation across resolutions up to 2048x2048 and supports both Chinese and English prompts. The models leverage GLM-4-9B or T5-XXL encoders for prompt understanding.

Quick Start & Requirements

  • Install: pip install diffusers transformers accelerate
  • Prerequisites: PyTorch with CUDA support, Python 3.8+. BF16 precision is recommended for inference.
  • Memory: Minimum 13GB VRAM with CPU offloading and 4-bit text encoder, up to 39GB VRAM without offloading for higher resolutions. 32GB RAM recommended.
  • Links: HuggingFace, ModelScope, Diffusers Example

Highlighted Details

  • CogView4-6B achieves 85.13 on DPG-Bench Overall and 0.73 on GenEval Overall.
  • Supports resolutions from 512x512 up to 2048x2048, with aspect ratios divisible by 32.
  • Native Chinese prompt support and generation capabilities.
  • Offers CPU offloading and tiling for reduced GPU memory usage.

Maintenance & Community

  • Actively developed by THUDM. Recent updates include diffusers adaptation and the upcoming CogKit fine-tuning toolkit.
  • Community contributions are welcomed, with existing wrappers for ComfyUI.
  • WeChat Community

Licensing & Compatibility

  • Code and CogView3 models are licensed under Apache 2.0.
  • CogView4 model weights are available for research and commercial use, subject to THUDM's terms.

Limitations & Caveats

  • Fine-tuning code is not included in the main repository but is available via CogKit or finetrainers.
  • Prompt optimization using an LLM is strongly recommended for optimal generation quality.
Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
7 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
Created 3 years ago
Updated 1 year ago
Starred by Deepak Pathak Deepak Pathak(Cofounder of Skild AI; Professor at CMU), Travis Fischer Travis Fischer(Founder of Agentic), and
8 more.

sygil-webui by Sygil-Dev

0.0%
8k
Web UI for Stable Diffusion
Created 3 years ago
Updated 2 months ago
Feedback? Help us improve.