SkyReels-V2 by SkyworkAI

Film generation model for infinite-length videos using diffusion forcing

Created 10 months ago

6,398 stars

Top 7.9% on SourcePulse

View on GitHub

3 Experts Love This Project

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Jiaming Song

Chief Scientist at Luma AI

Dror Weiss

Cofounder of Tabnine

Project Summary

SkyReels-V2 is an open-source video generation model designed for creating infinite-length films, addressing limitations in prompt adherence, visual quality, motion dynamics, and duration. It targets researchers and developers in AI video synthesis, offering a novel AutoRegressive Diffusion-Forcing architecture for state-of-the-art performance.

How It Works

SkyReels-V2 employs an AutoRegressive Diffusion-Forcing architecture, a novel approach that allows for indefinite video generation by treating each token with an independent noise level. This enables a form of partial masking, where the model learns to "unmask" variably noised tokens using cleaner ones as conditional information. This method builds upon full-sequence diffusion models and allows for seamless extension of video generation based on previous segments.

Quick Start & Requirements

Installation: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: Python 3.10.12 is used for testing.
Model Download: Models are available on Hugging Face and ModelScope.
Hardware: Generating 540P video with the 1.3B model requires ~14.7GB VRAM; the 14B model requires ~51.2GB VRAM. Multi-GPU inference is supported via torchrun --nproc_per_node=N.
Links: Technical Report, Playground, Discord, Hugging Face, ModelScope.

Highlighted Details

Achieves state-of-the-art performance in instruction adherence and visual quality among open-source models, as per human evaluation and V-Bench.
Supports both Text-to-Video (T2V) and Image-to-Video (I2V) generation.
Features a novel Diffusion Forcing Transformer for infinite-length video synthesis.
Includes a SkyCaptioner-V1 model for enhanced video captioning, outperforming other models in shot-related fields.

Maintenance & Community

The project is actively developed with recent releases in April 2025. Community support is available via Discord.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Some model variants (e.g., 5B models, Camera Director models) are listed as "Coming Soon." The prompt enhancer, while useful, may lead to over-saturation with long prompts and requires significant VRAM (64GB+).

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

422 stars in the last 30 days