taehv by madebyollin

Faster, lighter video latent processing

Created 1 year ago

305 stars

Top 88.0% on SourcePulse

Project Summary

TAEHV is a Tiny AutoEncoder designed for efficient latent space manipulation in video generation models like Hunyuan Video. It targets researchers and developers needing faster, memory-light video encoding/decoding for applications like real-time previews or interactive video, offering significant performance gains at a slight quality cost.

How It Works

This project implements a compact AutoEncoder architecture optimized for speed and low memory footprint. By processing video latents through a smaller model, TAEHV achieves decoding speeds orders of magnitude faster and requires substantially less VRAM than traditional, full-scale video VAEs, making it suitable for resource-constrained environments or interactive workflows.

Quick Start & Requirements

Installation: Integrated into ComfyUI (via PRs and custom nodes), stable-diffusion.cpp, and SDNext.
Prerequisites: Specific model weights (.pth, .safetensors) are required for compatibility with base video models (e.g., Hunyuan Video 1.5, Wan 2.1/2.2, Qwen Image, CogVideoX, Hunyuan Video 1, Open-Sora 1.3). Example notebooks are available for usage and integration.
Resource Footprint: Designed for significantly lower memory usage (<0.5GB peak) and faster processing compared to full VAEs.

Highlighted Details

Achieves ~0.5s decoding for 61 frames (512x320 fp16) versus ~2-3s for full VAEs on GH200.
Reduces peak memory usage to <0.5GB (fp16) from ~6-9GB for full VAEs.
Supports a range of popular open-source video models with specific weight files.
Causal structure enables potential real-time frame output during decoding.

Maintenance & Community

The project benefits from contributions enabling integrations into popular UIs and tools like ComfyUI, stable-diffusion.cpp, and SDNext. Specific contributors are credited for these integrations. No direct community channels (e.g., Discord, Slack) or a public roadmap are detailed in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text.

Limitations & Caveats

TAEHV prioritizes speed and efficiency, resulting in slightly lower video quality compared to full-size VAEs. Models like Mochi 1 and SVD are not directly supported and require separate repositories. Integration with libraries like Diffusers necessitates careful handling of dimension order (NTCHW vs NCTHW) and value ranges ( vs [-1, 1]).

taehv by madebyollin

Explore Similar Projects

EasyCache by H-EmbodVis

VideoChat-Flash by OpenGVLab

Flash-VStream by IVGSZ

kandinsky-5 by kandinskylab

sd-webui-text2video by kabachuha

VideoX-Fun by aigc-apps

FastVideo by hao-ai-lab

Pyramid-Flow by jy0205

Step-Video-T2V by stepfun-ai

ComfyUI-WanVideoWrapper by kijai

LTX-Video by Lightricks

Wan2.2 by Wan-Video