DiffSensei  by jianzongwu

CVPR 2025 paper implementation for customized manga generation

Created 9 months ago
847 stars

Top 42.1% on SourcePulse

GitHubView on GitHub
Project Summary

DiffSensei enables customized black-and-white manga generation by bridging multi-modal large language models (LLMs) and diffusion models. It allows users to generate varied-resolution manga panels with flexible character adaptation from a single input image, targeting researchers and artists interested in controllable AI-powered comic creation.

How It Works

DiffSensei employs a diffusion model architecture enhanced with an IP-Adapter for character consistency and an LLM for text-to-image conditioning. This approach allows for precise control over character appearance across different panels and supports flexible text prompts, enabling the generation of diverse manga scenes with specific character traits.

Quick Start & Requirements

  • Installation: Requires Python 3.11, PyTorch with CUDA 12.1, diffusers, transformers, accelerate, and xformers.
  • Prerequisites: A CUDA-enabled GPU (e.g., 24GB 4090 for reduced memory version) is recommended.
  • Setup: Download checkpoints from Huggingface.
  • Inference: Run via Gradio demo scripts (gradio or gradio_wo_mllm).
  • Resources: Official project page: https://jianzongwu.github.io/projects/diffsensei, arXiv: https://arxiv.org/abs/2412.07589.

Highlighted Details

  • Supports manga panel generation from 64x64 to 2048x2048 resolution.
  • Enables character consistency across multiple generated panels from a single input image.
  • Offers a version without MLLM for reduced memory usage, suitable for single 24GB GPUs.
  • Provides reference training code for t2i, condition, and MLLM stages.

Maintenance & Community

The project is associated with CVPR 2025 and has released checkpoints, datasets, and inference code. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The project's license is not explicitly stated in the README. The MangaZero dataset is provided via URLs and annotations due to potential licensing issues with direct image sharing.

Limitations & Caveats

The MangaZero dataset is a partial release (3/4 of the full dataset) due to unavailable image URLs. The provided reference training code is still in a testing phase and may require adjustments for specific datasets and requirements.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
3
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Sourabh Bajaj Sourabh Bajaj(Cofounder of Uplimit).

OmniSVG by OmniSVG

0.4%
2k
Multimodal SVG generator research paper leveraging VLMs
Created 5 months ago
Updated 1 month ago
Feedback? Help us improve.