Audio2face model for realtime 2D chat avatar generation
Top 84.6% on sourcepulse
LiteAvatar provides a real-time 2D chat avatar system driven by audio, capable of running at 30fps on CPU without GPU acceleration. It targets developers and researchers building interactive virtual agents or avatars for applications like video conferencing and virtual assistants, offering a lightweight and efficient solution for voice-synchronized facial animation.
How It Works
The system employs an efficient Automatic Speech Recognition (ASR) model for audio feature extraction, followed by a mouth parameter prediction model that generates synchronized mouth movements from these audio features. Finally, a lightweight 2D face generator renders these movements, enabling real-time inference even on mobile devices. This pipeline prioritizes efficiency and CPU-bound operation.
Quick Start & Requirements
pip install -r requirements.txt
.python lite_avatar.py --data_dir /path/to/sample_data --audio_file /path/to/audio.wav --result_dir /path/to/result
../data/sample_data.zip
.Highlighted Details
Maintenance & Community
The project acknowledges contributions from Paraformer and FunASR. A related paper is available for citation.
Licensing & Compatibility
The license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not specify the license, which may impact commercial adoption. The project relies on specific versions of Python and CUDA, though CPU-only operation is a key feature.
1 month ago
Inactive