FlashPortrait  by Francis-Rings

Faster infinite portrait animation

Created 3 weeks ago

New!

409 stars

Top 71.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

FlashPortrait addresses the challenge of generating high-fidelity, identity-preserving, and infinitely long portrait animations efficiently. It targets researchers and engineers in video synthesis and AI animation, offering a significant speedup (up to 6x) over existing methods without compromising visual quality or identity consistency.

How It Works

This project utilizes an end-to-end video diffusion transformer architecture. It begins by extracting identity-agnostic facial expression features, which are then aligned with diffusion latents via a novel Normalized Facial Expression Block to enhance identity stability. For long video synthesis, a dynamic sliding-window scheme with weighted blending ensures smooth transitions. Crucially, FlashPortrait employs Adaptive Latent Prediction, leveraging higher-order latent derivatives to skip denoising steps, thereby achieving substantial inference acceleration.

Quick Start & Requirements

Installation involves PyTorch (v2.6.0, CUDA 12.4 recommended) and dependencies from requirements.txt. Optional acceleration can be gained by installing flash_attn. Model weights must be downloaded manually from Hugging Face. Inference is initiated via python infer.py or python fast_infer.py. Links to the project page, code, and technical report are available from the December 15, 2025 release.

Highlighted Details

  • Achieves up to 6x faster inference speeds for portrait animation.
  • Synthesizes ID-preserving, infinite-length videos without post-processing.
  • Supports a range of output resolutions, including 720p and 1280p formats.
  • Demonstrates superior performance over state-of-the-art models in qualitative and quantitative benchmarks.

Maintenance & Community

The project saw significant releases in December 2025, including code, checkpoints, and a ComfyUI implementation. Development is ongoing, with a "To-Do List" indicating planned features like multi-GPU inference. Key contributors are affiliated with major research institutions and tech companies. No direct community channels (like Discord/Slack) are listed.

Licensing & Compatibility

The README does not specify a software license. This absence requires clarification for any adoption decision, particularly concerning commercial use or derivative works.

Limitations & Caveats

Training FlashPortrait demands substantial VRAM (40-50GB), while inference VRAM can be reduced from ~60GB to ~10GB using CPU offloading techniques. The 3D VAE decoder can be memory-intensive for very long videos, though CPU decoding is an option. Training requires meticulously organized datasets with specific mask types and static backgrounds. Multi-GPU inference support is still under development.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
6
Star History
416 stars in the last 25 days

Explore Similar Projects

Feedback? Help us improve.