DiffSensei  by jianzongwu

CVPR 2025 paper implementation for customized manga generation

created 8 months ago
834 stars

Top 43.6% on sourcepulse

GitHubView on GitHub
Project Summary

DiffSensei enables customized black-and-white manga generation by bridging multi-modal large language models (LLMs) and diffusion models. It allows users to generate varied-resolution manga panels with flexible character adaptation from a single input image, targeting researchers and artists interested in controllable AI-powered comic creation.

How It Works

DiffSensei employs a diffusion model architecture enhanced with an IP-Adapter for character consistency and an LLM for text-to-image conditioning. This approach allows for precise control over character appearance across different panels and supports flexible text prompts, enabling the generation of diverse manga scenes with specific character traits.

Quick Start & Requirements

  • Installation: Requires Python 3.11, PyTorch with CUDA 12.1, diffusers, transformers, accelerate, and xformers.
  • Prerequisites: A CUDA-enabled GPU (e.g., 24GB 4090 for reduced memory version) is recommended.
  • Setup: Download checkpoints from Huggingface.
  • Inference: Run via Gradio demo scripts (gradio or gradio_wo_mllm).
  • Resources: Official project page: https://jianzongwu.github.io/projects/diffsensei, arXiv: https://arxiv.org/abs/2412.07589.

Highlighted Details

  • Supports manga panel generation from 64x64 to 2048x2048 resolution.
  • Enables character consistency across multiple generated panels from a single input image.
  • Offers a version without MLLM for reduced memory usage, suitable for single 24GB GPUs.
  • Provides reference training code for t2i, condition, and MLLM stages.

Maintenance & Community

The project is associated with CVPR 2025 and has released checkpoints, datasets, and inference code. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The project's license is not explicitly stated in the README. The MangaZero dataset is provided via URLs and annotations due to potential licensing issues with direct image sharing.

Limitations & Caveats

The MangaZero dataset is a partial release (3/4 of the full dataset) due to unavailable image URLs. The provided reference training code is still in a testing phase and may require adjustments for specific datasets and requirements.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
54 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
2 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
created 3 years ago
updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Feedback? Help us improve.