SkyReels-V2  by SkyworkAI

Film generation model for infinite-length videos using diffusion forcing

created 3 months ago
3,709 stars

Top 13.3% on sourcepulse

GitHubView on GitHub
Project Summary

SkyReels-V2 is an open-source video generation model designed for creating infinite-length films, addressing limitations in prompt adherence, visual quality, motion dynamics, and duration. It targets researchers and developers in AI video synthesis, offering a novel AutoRegressive Diffusion-Forcing architecture for state-of-the-art performance.

How It Works

SkyReels-V2 employs an AutoRegressive Diffusion-Forcing architecture, a novel approach that allows for indefinite video generation by treating each token with an independent noise level. This enables a form of partial masking, where the model learns to "unmask" variably noised tokens using cleaner ones as conditional information. This method builds upon full-sequence diffusion models and allows for seamless extension of video generation based on previous segments.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: Python 3.10.12 is used for testing.
  • Model Download: Models are available on Hugging Face and ModelScope.
  • Hardware: Generating 540P video with the 1.3B model requires ~14.7GB VRAM; the 14B model requires ~51.2GB VRAM. Multi-GPU inference is supported via torchrun --nproc_per_node=N.
  • Links: Technical Report, Playground, Discord, Hugging Face, ModelScope.

Highlighted Details

  • Achieves state-of-the-art performance in instruction adherence and visual quality among open-source models, as per human evaluation and V-Bench.
  • Supports both Text-to-Video (T2V) and Image-to-Video (I2V) generation.
  • Features a novel Diffusion Forcing Transformer for infinite-length video synthesis.
  • Includes a SkyCaptioner-V1 model for enhanced video captioning, outperforming other models in shot-related fields.

Maintenance & Community

The project is actively developed with recent releases in April 2025. Community support is available via Discord.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Some model variants (e.g., 5B models, Camera Director models) are listed as "Coming Soon." The prompt enhancer, while useful, may lead to over-saturation with long prompts and requires significant VRAM (64GB+).

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
7
Issues (30d)
7
Star History
2,173 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

Open-Sora-Plan by PKU-YuanGroup

0.1%
12k
Open-source project aiming to reproduce Sora-like T2V model
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.