SkyReels-V2  by SkyworkAI

Film generation model for infinite-length videos using diffusion forcing

Created 5 months ago
4,494 stars

Top 11.0% on SourcePulse

GitHubView on GitHub
Project Summary

SkyReels-V2 is an open-source video generation model designed for creating infinite-length films, addressing limitations in prompt adherence, visual quality, motion dynamics, and duration. It targets researchers and developers in AI video synthesis, offering a novel AutoRegressive Diffusion-Forcing architecture for state-of-the-art performance.

How It Works

SkyReels-V2 employs an AutoRegressive Diffusion-Forcing architecture, a novel approach that allows for indefinite video generation by treating each token with an independent noise level. This enables a form of partial masking, where the model learns to "unmask" variably noised tokens using cleaner ones as conditional information. This method builds upon full-sequence diffusion models and allows for seamless extension of video generation based on previous segments.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: Python 3.10.12 is used for testing.
  • Model Download: Models are available on Hugging Face and ModelScope.
  • Hardware: Generating 540P video with the 1.3B model requires ~14.7GB VRAM; the 14B model requires ~51.2GB VRAM. Multi-GPU inference is supported via torchrun --nproc_per_node=N.
  • Links: Technical Report, Playground, Discord, Hugging Face, ModelScope.

Highlighted Details

  • Achieves state-of-the-art performance in instruction adherence and visual quality among open-source models, as per human evaluation and V-Bench.
  • Supports both Text-to-Video (T2V) and Image-to-Video (I2V) generation.
  • Features a novel Diffusion Forcing Transformer for infinite-length video synthesis.
  • Includes a SkyCaptioner-V1 model for enhanced video captioning, outperforming other models in shot-related fields.

Maintenance & Community

The project is actively developed with recent releases in April 2025. Community support is available via Discord.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Some model variants (e.g., 5B models, Camera Director models) are listed as "Coming Soon." The prompt enhancer, while useful, may lead to over-saturation with long prompts and requires significant VRAM (64GB+).

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
9
Star History
286 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.