taehv  by madebyollin

Faster, lighter video latent processing

Created 10 months ago
263 stars

Top 97.0% on SourcePulse

GitHubView on GitHub
Project Summary

TAEHV is a Tiny AutoEncoder designed for efficient latent space manipulation in video generation models like Hunyuan Video. It targets researchers and developers needing faster, memory-light video encoding/decoding for applications like real-time previews or interactive video, offering significant performance gains at a slight quality cost.

How It Works

This project implements a compact AutoEncoder architecture optimized for speed and low memory footprint. By processing video latents through a smaller model, TAEHV achieves decoding speeds orders of magnitude faster and requires substantially less VRAM than traditional, full-scale video VAEs, making it suitable for resource-constrained environments or interactive workflows.

Quick Start & Requirements

  • Installation: Integrated into ComfyUI (via PRs and custom nodes), stable-diffusion.cpp, and SDNext.
  • Prerequisites: Specific model weights (.pth, .safetensors) are required for compatibility with base video models (e.g., Hunyuan Video 1.5, Wan 2.1/2.2, Qwen Image, CogVideoX, Hunyuan Video 1, Open-Sora 1.3). Example notebooks are available for usage and integration.
  • Resource Footprint: Designed for significantly lower memory usage (<0.5GB peak) and faster processing compared to full VAEs.

Highlighted Details

  • Achieves ~0.5s decoding for 61 frames (512x320 fp16) versus ~2-3s for full VAEs on GH200.
  • Reduces peak memory usage to <0.5GB (fp16) from ~6-9GB for full VAEs.
  • Supports a range of popular open-source video models with specific weight files.
  • Causal structure enables potential real-time frame output during decoding.

Maintenance & Community

The project benefits from contributions enabling integrations into popular UIs and tools like ComfyUI, stable-diffusion.cpp, and SDNext. Specific contributors are credited for these integrations. No direct community channels (e.g., Discord, Slack) or a public roadmap are detailed in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text.

Limitations & Caveats

TAEHV prioritizes speed and efficiency, resulting in slightly lower video quality compared to full-size VAEs. Models like Mochi 1 and SVD are not directly supported and require separate repositories. Integration with libraries like Diffusers necessitates careful handling of dimension order (NTCHW vs NCTHW) and value ranges ( vs [-1, 1]).

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
17 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
1 more.

FastVideo by hao-ai-lab

1.4%
3k
Framework for accelerated video generation
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.