Stable-Video-Infinity  by vita-epfl

Infinite-length video generation with error recycling

Created 3 weeks ago

New!

473 stars

Top 64.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Stable Video Infinity (SVI) enables the generation of arbitrarily long, temporally consistent videos with controllable storylines. It targets creative professionals and researchers, overcoming limitations of existing methods by achieving high-fidelity, infinite-length video synthesis across diverse domains.

How It Works

SVI utilizes "Error-Recycling Fine-Tuning," a novel strategy that simulates real-world error accumulation during autoregressive generation. By injecting and banking past errors into the training loop, SVI learns to actively correct these simulated errors, achieving superior temporal consistency and avoiding drift in long-form video. This approach mirrors a director's iterative review process within clips and causal connections between them.

Quick Start & Requirements

Tested with A100 80G GPUs, CUDA 12.0, and PyTorch 2.8.0. Setup requires a Conda environment (python=3.10), core installations (pip install -e ., flash_attn==2.8.0.post2), and system packages (ffmpeg, librosa, libiconv). Significant model downloads from Hugging Face are necessary, including the base Wan 2.1 I2V 14B and SVI family models. Official inference scripts and a Gradio demo (SVI-Shot/Film) are available.

Highlighted Details

  • Infinite-Length Video: Generates videos of arbitrary duration, demonstrated with up to 10-minute outputs.
  • Task Versatility: Supports multi-scene films, single-scene animations, skeleton/audio-conditioned generation, cartoons, and talking heads.
  • Efficient Fine-tuning: Requires only LoRA adapter tuning for custom SVI creation with minimal data.
  • Hybrid Causality: Combines clip-by-clip causality with bidirectional attention within clips for enhanced control and coherence.

Maintenance & Community

The repository is stated to be continuously maintained. Contact is available via email (wuyang.li@epfl.ch). No specific community channels or contributor details are provided.

Licensing & Compatibility

The specific open-source license is not stated in the provided README text. Users should verify licensing terms for commercial or closed-source integration.

Limitations & Caveats

ComfyUI integration is noted as "nearly fixed" and pending release. Tested hardware (A100 80G) suggests high-end GPU requirements, and large model sizes are implied. The absence of a stated license is a significant adoption caveat.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
22
Star History
480 stars in the last 26 days

Explore Similar Projects

Feedback? Help us improve.