ConsisID  by PKU-YuanGroup

Text-to-video generation research paper for identity preservation

created 8 months ago
740 stars

Top 47.8% on sourcepulse

GitHubView on GitHub
Project Summary

ConsisID addresses the challenge of maintaining consistent human identity in text-to-video generation. It targets researchers and developers in AI video synthesis, offering a DiT-based controllable model that leverages frequency decomposition for identity preservation without fine-tuning.

How It Works

ConsisID employs a novel frequency decomposition approach inspired by vision and diffusion transformers. This method allows the model to disentangle identity-related features from other visual information, enabling consistent identity representation across generated video frames. The architecture is built upon a diffusion transformer (DiT) backbone, providing a strong foundation for high-quality video synthesis.

Quick Start & Requirements

  • Install: pip install git+https://github.com/huggingface/diffusers.git (dev version) or follow environment setup instructions.
  • Prerequisites: Python 3.11.0, PyTorch (CUDA 11.8 or 12.1 recommended), requirements.txt.
  • Resources: Requires significant GPU memory (44 GB for full resolution, ~22 GB with optimizations like enable_model_cpu_offload and vae.enable_tiling).
  • Links: Diffusers API Demo, Jupyter Notebook, Project Page.

Highlighted Details

  • CVPR 2025 Highlight.
  • Tuning-free identity-preserving text-to-video generation.
  • Supports parallel inference via xDiT and cache inference via TeaCache.
  • Offers GPU memory optimization techniques for broader accessibility.

Maintenance & Community

The project is actively maintained by PKU-YuanGroup, with contributions noted from Hugging Face developers and community members. Links to community extensions (ComfyUI, Windows Docker) and active development discussions are provided.

Licensing & Compatibility

Licensed under Apache 2.0, with a specific license for the CogVideoX-5B model component. Generally permissive for research and commercial use, but users should verify the CogVideoX license terms.

Limitations & Caveats

High GPU memory requirements can be a barrier for users without high-end hardware, though optimizations are provided. The README notes that results can vary between machines even with identical seeds and prompts.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
59 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.