FlashPortrait by Francis-Rings

Faster infinite portrait animation

Created 6 months ago

480 stars

Top 63.1% on SourcePulse

Project Summary

Summary

FlashPortrait addresses the challenge of generating high-fidelity, identity-preserving, and infinitely long portrait animations efficiently. It targets researchers and engineers in video synthesis and AI animation, offering a significant speedup (up to 6x) over existing methods without compromising visual quality or identity consistency.

How It Works

This project utilizes an end-to-end video diffusion transformer architecture. It begins by extracting identity-agnostic facial expression features, which are then aligned with diffusion latents via a novel Normalized Facial Expression Block to enhance identity stability. For long video synthesis, a dynamic sliding-window scheme with weighted blending ensures smooth transitions. Crucially, FlashPortrait employs Adaptive Latent Prediction, leveraging higher-order latent derivatives to skip denoising steps, thereby achieving substantial inference acceleration.

Quick Start & Requirements

Installation involves PyTorch (v2.6.0, CUDA 12.4 recommended) and dependencies from requirements.txt. Optional acceleration can be gained by installing flash_attn. Model weights must be downloaded manually from Hugging Face. Inference is initiated via python infer.py or python fast_infer.py. Links to the project page, code, and technical report are available from the December 15, 2025 release.

Highlighted Details

Achieves up to 6x faster inference speeds for portrait animation.
Synthesizes ID-preserving, infinite-length videos without post-processing.
Supports a range of output resolutions, including 720p and 1280p formats.
Demonstrates superior performance over state-of-the-art models in qualitative and quantitative benchmarks.

Maintenance & Community

The project saw significant releases in December 2025, including code, checkpoints, and a ComfyUI implementation. Development is ongoing, with a "To-Do List" indicating planned features like multi-GPU inference. Key contributors are affiliated with major research institutions and tech companies. No direct community channels (like Discord/Slack) are listed.

Licensing & Compatibility

The README does not specify a software license. This absence requires clarification for any adoption decision, particularly concerning commercial use or derivative works.

Limitations & Caveats

Training FlashPortrait demands substantial VRAM (40-50GB), while inference VRAM can be reduced from ~60GB to ~10GB using CPU offloading techniques. The 3D VAE decoder can be memory-intensive for very long videos, though CPU decoding is an option. Training requires meticulously organized datasets with specific mask types and static backgrounds. Multi-GPU inference support is still under development.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days