DiffSensei by jianzongwu

CVPR 2025 paper implementation for customized manga generation

Created 1 year ago

888 stars

Top 40.6% on SourcePulse

Project Summary

DiffSensei enables customized black-and-white manga generation by bridging multi-modal large language models (LLMs) and diffusion models. It allows users to generate varied-resolution manga panels with flexible character adaptation from a single input image, targeting researchers and artists interested in controllable AI-powered comic creation.

How It Works

DiffSensei employs a diffusion model architecture enhanced with an IP-Adapter for character consistency and an LLM for text-to-image conditioning. This approach allows for precise control over character appearance across different panels and supports flexible text prompts, enabling the generation of diverse manga scenes with specific character traits.

Quick Start & Requirements

Installation: Requires Python 3.11, PyTorch with CUDA 12.1, diffusers, transformers, accelerate, and xformers.
Prerequisites: A CUDA-enabled GPU (e.g., 24GB 4090 for reduced memory version) is recommended.
Setup: Download checkpoints from Huggingface.
Inference: Run via Gradio demo scripts (gradio or gradio_wo_mllm).
Resources: Official project page: https://jianzongwu.github.io/projects/diffsensei, arXiv: https://arxiv.org/abs/2412.07589.

Highlighted Details

Supports manga panel generation from 64x64 to 2048x2048 resolution.
Enables character consistency across multiple generated panels from a single input image.
Offers a version without MLLM for reduced memory usage, suitable for single 24GB GPUs.
Provides reference training code for t2i, condition, and MLLM stages.

Maintenance & Community

The project is associated with CVPR 2025 and has released checkpoints, datasets, and inference code. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The project's license is not explicitly stated in the README. The MangaZero dataset is provided via URLs and annotations due to potential licensing issues with direct image sharing.

Limitations & Caveats

The MangaZero dataset is a partial release (3/4 of the full dataset) due to unavailable image URLs. The provided reference training code is still in a testing phase and may require adjustments for specific datasets and requirements.

DiffSensei by jianzongwu

Explore Similar Projects

X-Omni by X-Omni-Team

ShareGPT-4o-Image by FreedomIntelligence

VARGPT by VARGPT-family

UltraPixel by catcathh

Liquid by FoundationVision

OMG by kongzhecn

Lumina-mGPT-2.0 by Alpha-VLLM

stable-diffusion-pytorch by kjsman

OmniSVG by OmniSVG

LlamaGen by FoundationVision

BallonsTranslator by dmMaze

Janus by deepseek-ai