Helios by PKU-YuanGroup

Breakthrough in real-time long video generation

Created 3 months ago

1,900 stars

Top 22.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiaming Song

Chief Scientist at Luma AI

Project Summary

Helios is a 14B parameter model for real-time, long-form video generation, offering superior efficiency and quality compared to smaller models. It targets researchers and developers needing high-quality, minute-scale video synthesis at high frame rates, providing a significant performance boost for generative AI.

How It Works

Helios generates video autoregressively in 33-frame chunks, achieving high temporal coherence for minute-long videos without conventional anti-drifting strategies (e.g., self-forcing, keyframe sampling). It also bypasses standard acceleration techniques like KV-caching or quantization, yet delivers 19.5 FPS on a single H100 GPU. This design prioritizes end-to-end inference efficiency and reduced memory, enabling larger training batches and fitting multiple models within limited VRAM.

Quick Start & Requirements

Installation requires cloning the repo, setting up a Python 3.11.2 conda environment, and installing PyTorch with specific CUDA versions (11.8, 12.6, or 12.8). Dependencies are installed via bash install.sh. High-performance inference is demonstrated on a single NVIDIA H100 GPU. Integrations with Diffusers, vLLM-Omni, and SGLang-Diffusion require installation from their respective GitHub repositories.

Project Page: https://pku-yuangroup.github.io/Helios-Page
Gradio Demo: https://huggingface.co/spaces/BestWishYsh/Helios-14B-RealTime
GitHub: https://github.com/PKU-YuanGroup/Helios
arXiv: https://arxiv.org/abs/2603.04379

Highlighted Details

Achieves 19.5 FPS on a single H100 GPU for minute-scale, high-quality video generation.
Generates coherent long videos without common anti-drifting techniques.
Offers high inference speed without standard acceleration methods.
Supports Text-to-Video, Image-to-Video, Video-to-Video, and interactive generation.
Provides optimized integrations with Diffusers, vLLM-Omni, and SGLang-Diffusion.
Three model variants (Base, Mid, Distilled) offer quality/efficiency trade-offs.

Maintenance & Community

The project benefits from integration efforts by Ascend, HuggingFace (Diffusers), vLLM-Omni, and SGLang-Diffusion. Contact is available via email at shyuan-cs@hotmail.com. No specific community channels like Discord or Slack are listed.

Licensing & Compatibility

Helios is released under the Apache 2.0 license, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

The Helios-Mid model is noted as an intermediate checkpoint that "may not meet expected quality." Image-to-Video and Video-to-Video functionalities might be slightly less performant than Text-to-Video. Performance claims are contingent on specific high-end hardware like the H100.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

114 stars in the last 30 days