Discover and explore top open-source AI tools and projects—updated daily.
NVlabsGenerate long videos interactively in real-time
Top 44.9% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> LongLive offers a frame-level autoregressive (AR) framework for real-time, interactive long video generation. It targets researchers and developers aiming to create dynamic, user-guided video content, enabling instant minute-long video synthesis as prompts are typed, significantly improving efficiency and quality over traditional diffusion models for long-form video.
How It Works
LongLive employs a causal, frame-level AR design optimized for long video generation. Key innovations include a KV-recache mechanism for smooth prompt transitions by refreshing cached states, and "streaming long tuning" to align training and inference pipelines for extended durations. It also uses short window attention with a "frame sink" (frame-level attention sink) to maintain long-range consistency while accelerating generation. This architecture overcomes efficiency limits of bidirectional attention models and training memory challenges of causal AR models for long sequences.
Quick Start & Requirements
Installation involves cloning the repo, creating a conda environment (Python 3.10), and installing dependencies like PyTorch 2.5.0 (cu124), CUDA 12.4.1, and flash-attn 2.7.4.post1. Requirements include an Nvidia GPU with >= 40 GB memory (A100/H100 tested), Linux OS, and 64 GB RAM. Links to the demo page are available.
Highlighted Details
Maintenance & Community
The project builds upon the "Self-Forcing" codebase and the "Wan" base model. No specific community channels (e.g., Discord, Slack) or detailed contributor information beyond the listed authors are provided in the README.
Licensing & Compatibility
The LongLive code is licensed under CC-BY-NC-SA 4.0, and model weights under CC-BY-NC 4.0. The "NC" (NonCommercial) clause restricts usage to non-commercial purposes, impacting compatibility with commercial applications.
Limitations & Caveats
Camera motion cannot be explicitly controlled during significant scene transitions. The framework excels at cinematic long takes but is less suited for rapid shot-by-shot edits or fast cutscenes. Deployment requires substantial hardware: high-end Nvidia GPUs and significant system RAM.
1 day ago
Inactive
hao-ai-lab
Lightricks