Discover and explore top open-source AI tools and projects—updated daily.
Alibaba-QuarkReal-time audio-driven avatar generation framework
Top 29.3% on SourcePulse
Live Avatar is a framework for real-time, streaming, audio-driven avatar video generation capable of handling infinite video lengths. It targets researchers and developers working on interactive virtual agents, real-time animation, and AI-powered content creation, offering a solution for seamless, continuous avatar experiences. The primary benefit is enabling highly responsive and lengthy avatar interactions without the typical constraints of fixed-length generation.
How It Works
This project utilizes a 14B-parameter diffusion model combined with Block-wise Autoregressive processing. This approach allows for streaming generation, breaking down long videos into manageable blocks processed sequentially. The algorithm-system co-design focuses on achieving low latency for real-time interaction while maintaining high-fidelity video synthesis, enabling continuous, infinite-length output.
Quick Start & Requirements
The core code is scheduled for open-source release in early December. Key planned releases include inference code, Hugging Face checkpoints, and a Gradio demo. Experimental real-time streaming inference is targeted on H800 GPUs. Specific hardware requirements (e.g., 5x H800 GPUs for 20 FPS) and software dependencies (CUDA) will be detailed upon release. A demo video is available at https://www.youtube.com/watch?v=srbsGlLNpAc.
Highlighted Details
Maintenance & Community
The project is affiliated with Alibaba Group and several universities. Specific community channels (e.g., Discord, Slack) or active contributor information are not detailed in the provided text. Further updates are planned for optimized inference on consumer/professional GPUs (RTX 4090/A100) and integration with other tools like ComfyUI.
Licensing & Compatibility
No specific open-source license is mentioned in the provided README excerpt. Potential users should verify licensing terms upon code release, especially concerning commercial use or integration into closed-source projects.
Limitations & Caveats
The project is in a pre-release phase, with core code and inference capabilities pending public release in early December. Current performance benchmarks are specific to high-end hardware (5x H800 GPUs), and optimization for more common GPUs (RTX 4090/A100) is listed as a future update. The full scope of supported features and potential limitations will be clearer post-release.
1 week ago
Inactive
Lightricks