LongLive by NVlabs

Generate long videos interactively in real-time

Created 1 month ago

779 stars

Top 44.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiaming Song

Chief Scientist at Luma AI

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> LongLive offers a frame-level autoregressive (AR) framework for real-time, interactive long video generation. It targets researchers and developers aiming to create dynamic, user-guided video content, enabling instant minute-long video synthesis as prompts are typed, significantly improving efficiency and quality over traditional diffusion models for long-form video.

How It Works

LongLive employs a causal, frame-level AR design optimized for long video generation. Key innovations include a KV-recache mechanism for smooth prompt transitions by refreshing cached states, and "streaming long tuning" to align training and inference pipelines for extended durations. It also uses short window attention with a "frame sink" (frame-level attention sink) to maintain long-range consistency while accelerating generation. This architecture overcomes efficiency limits of bidirectional attention models and training memory challenges of causal AR models for long sequences.

Quick Start & Requirements

Installation involves cloning the repo, creating a conda environment (Python 3.10), and installing dependencies like PyTorch 2.5.0 (cu124), CUDA 12.4.1, and flash-attn 2.7.4.post1. Requirements include an Nvidia GPU with >= 40 GB memory (A100/H100 tested), Linux OS, and 64 GB RAM. Links to the demo page are available.

Highlighted Details

Supports video generation up to 240 seconds with maintained visual consistency.
Achieves real-time inference speeds of 20.7 FPS on a single NVIDIA H100 GPU, increasing to 24.8 FPS with FP8 quantization.
Enables efficient fine-tuning, extending a 1.3B-parameter short-clip model to minute-long generation in ~32 H100 GPU-days.
Supports INT8-quantized inference with only marginal quality degradation.

Maintenance & Community

The project builds upon the "Self-Forcing" codebase and the "Wan" base model. No specific community channels (e.g., Discord, Slack) or detailed contributor information beyond the listed authors are provided in the README.

Licensing & Compatibility

The LongLive code is licensed under CC-BY-NC-SA 4.0, and model weights under CC-BY-NC 4.0. The "NC" (NonCommercial) clause restricts usage to non-commercial purposes, impacting compatibility with commercial applications.

Limitations & Caveats

Camera motion cannot be explicitly controlled during significant scene transitions. The framework excels at cinematic long takes but is less suited for rapid shot-by-shot edits or fast cutscenes. Deployment requires substantial hardware: high-end Nvidia GPUs and significant system RAM.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

212 stars in the last 30 days