Discover and explore top open-source AI tools and projects—updated daily.
PKU-YuanGroupBreakthrough in real-time long video generation
New!
Top 35.1% on SourcePulse
Helios is a 14B parameter model for real-time, long-form video generation, offering superior efficiency and quality compared to smaller models. It targets researchers and developers needing high-quality, minute-scale video synthesis at high frame rates, providing a significant performance boost for generative AI.
How It Works
Helios generates video autoregressively in 33-frame chunks, achieving high temporal coherence for minute-long videos without conventional anti-drifting strategies (e.g., self-forcing, keyframe sampling). It also bypasses standard acceleration techniques like KV-caching or quantization, yet delivers 19.5 FPS on a single H100 GPU. This design prioritizes end-to-end inference efficiency and reduced memory, enabling larger training batches and fitting multiple models within limited VRAM.
Quick Start & Requirements
Installation requires cloning the repo, setting up a Python 3.11.2 conda environment, and installing PyTorch with specific CUDA versions (11.8, 12.6, or 12.8). Dependencies are installed via bash install.sh. High-performance inference is demonstrated on a single NVIDIA H100 GPU. Integrations with Diffusers, vLLM-Omni, and SGLang-Diffusion require installation from their respective GitHub repositories.
Highlighted Details
Maintenance & Community
The project benefits from integration efforts by Ascend, HuggingFace (Diffusers), vLLM-Omni, and SGLang-Diffusion. Contact is available via email at shyuan-cs@hotmail.com. No specific community channels like Discord or Slack are listed.
Licensing & Compatibility
Helios is released under the Apache 2.0 license, permitting commercial use and integration into closed-source projects.
Limitations & Caveats
The Helios-Mid model is noted as an intermediate checkpoint that "may not meet expected quality." Image-to-Video and Video-to-Video functionalities might be slightly less performant than Text-to-Video. Performance claims are contingent on specific high-end hardware like the H100.
1 day ago
Inactive
hao-ai-lab
Lightricks