ConsisID by PKU-YuanGroup

Text-to-video generation research paper for identity preservation

Created 1 year ago

796 stars

Top 44.3% on SourcePulse

Project Summary

ConsisID addresses the challenge of maintaining consistent human identity in text-to-video generation. It targets researchers and developers in AI video synthesis, offering a DiT-based controllable model that leverages frequency decomposition for identity preservation without fine-tuning.

How It Works

ConsisID employs a novel frequency decomposition approach inspired by vision and diffusion transformers. This method allows the model to disentangle identity-related features from other visual information, enabling consistent identity representation across generated video frames. The architecture is built upon a diffusion transformer (DiT) backbone, providing a strong foundation for high-quality video synthesis.

Quick Start & Requirements

Install: pip install git+https://github.com/huggingface/diffusers.git (dev version) or follow environment setup instructions.
Prerequisites: Python 3.11.0, PyTorch (CUDA 11.8 or 12.1 recommended), requirements.txt.
Resources: Requires significant GPU memory (44 GB for full resolution, ~22 GB with optimizations like enable_model_cpu_offload and vae.enable_tiling).
Links: Diffusers API Demo, Jupyter Notebook, Project Page.

Highlighted Details

CVPR 2025 Highlight.
Tuning-free identity-preserving text-to-video generation.
Supports parallel inference via xDiT and cache inference via TeaCache.
Offers GPU memory optimization techniques for broader accessibility.

Maintenance & Community

The project is actively maintained by PKU-YuanGroup, with contributions noted from Hugging Face developers and community members. Links to community extensions (ComfyUI, Windows Docker) and active development discussions are provided.

Licensing & Compatibility

Licensed under Apache 2.0, with a specific license for the CogVideoX-5B model component. Generally permissive for research and commercial use, but users should verify the CogVideoX license terms.

Limitations & Caveats

High GPU memory requirements can be a barrier for users without high-end hardware, though optimizations are provided. The README notes that results can vary between machines even with identical seeds and prompts.

ConsisID by PKU-YuanGroup

Explore Similar Projects

BindWeave by bytedance

Awesome-Generation-Acceleration by xuyang-liu16

Echo-4o by yejy53

VideoTuna by VideoVerses

Magic-Me by Zhen-Dong

Ming by inclusionAI

LTX-2 by Lightricks

FastVideo by hao-ai-lab

Pyramid-Flow by jy0205

Tune-A-Video by showlab

sdnext by vladmandic

InstantID by instantX-research