LiveAvatar by Alibaba-Quark

Real-time audio-driven avatar generation framework

Created 2 months ago

1,829 stars

Top 23.2% on SourcePulse

Project Summary

Live Avatar is a framework for real-time, streaming, audio-driven avatar video generation capable of handling infinite video lengths. It targets researchers and developers working on interactive virtual agents, real-time animation, and AI-powered content creation, offering a solution for seamless, continuous avatar experiences. The primary benefit is enabling highly responsive and lengthy avatar interactions without the typical constraints of fixed-length generation.

How It Works

This project utilizes a 14B-parameter diffusion model combined with Block-wise Autoregressive processing. This approach allows for streaming generation, breaking down long videos into manageable blocks processed sequentially. The algorithm-system co-design focuses on achieving low latency for real-time interaction while maintaining high-fidelity video synthesis, enabling continuous, infinite-length output.

Quick Start & Requirements

The core code is scheduled for open-source release in early December. Key planned releases include inference code, Hugging Face checkpoints, and a Gradio demo. Experimental real-time streaming inference is targeted on H800 GPUs. Specific hardware requirements (e.g., 5x H800 GPUs for 20 FPS) and software dependencies (CUDA) will be detailed upon release. A demo video is available at https://www.youtube.com/watch?v=srbsGlLNpAc.

Highlighted Details

Achieves 20 FPS real-time streaming with low latency using 4-step sampling on specified hardware.
Supports 10,000+ second continuous video generation via autoregressive processing.
Demonstrates strong generalization across various scenarios, including cartoon characters and singing.

Maintenance & Community

The project is affiliated with Alibaba Group and several universities. Specific community channels (e.g., Discord, Slack) or active contributor information are not detailed in the provided text. Further updates are planned for optimized inference on consumer/professional GPUs (RTX 4090/A100) and integration with other tools like ComfyUI.

Licensing & Compatibility

No specific open-source license is mentioned in the provided README excerpt. Potential users should verify licensing terms upon code release, especially concerning commercial use or integration into closed-source projects.

Limitations & Caveats

The project is in a pre-release phase, with core code and inference capabilities pending public release in early December. Current performance benchmarks are specific to high-end hardware (5x H800 GPUs), and optimization for more common GPUs (RTX 4090/A100) is listed as a future update. The full scope of supported features and potential limitations will be clearer post-release.

LiveAvatar by Alibaba-Quark

Explore Similar Projects

Causal-Forcing by thu-ml

TATS by songweige

FreeNoise by AILab-CVC

LongVie by Vchitect

DiT-Extrapolation by thu-ml

LongCat-Video by meituan-longcat

Step-Video-T2V by stepfun-ai

Awesome-Video-Diffusion by showlab

Text2Video-Zero by Picsart-AI-Research

LTX-2 by Lightricks

LTX-Video by Lightricks

Wan2.2 by Wan-Video