SentiAvatar by SentiAvatar

Framework for expressive, interactive 3D digital humans

Created 2 weeks ago

New!

253 stars

Top 99.3% on SourcePulse

Project Summary

Summary

SentiAvatar generates expressive, interactive 3D digital humans, producing semantically aligned and synchronized motion and facial expressions from speech. Aimed at AI animation and virtual character researchers and developers, it offers real-time, high-quality generation, significantly advancing the state-of-the-art.

How It Works

It uses an audio-aware Plan-then-Infill architecture. A Large Language Model (LLM) plans sparse, semantically relevant keyframes; an Infill Transformer interpolates dense, prosody-driven frames using HuBERT audio features. This decoupling ensures semantic appropriateness and temporal synchronization. A Residual Vector Quantized Variational Autoencoder (RVQVAE) decodes discrete motion tokens into continuous animation.

Quick Start & Requirements

Install: git clone, conda create -n sentiavatar python=3.10, conda activate sentiavatar, pip install -r requirements.txt. Download models from HuggingFace (Chuhaojin/SentiAvatar) to checkpoints/. Inference: Batch mode requires data preprocessing (scripts/preprocess_data.py) and a vLLM server. Single-case uses scripts/run_single_infer.sh. CUDA is recommended.

Highlighted Details

Features the SuSuInterActs dataset: 37 hours of synchronized optical motion capture (speech, full-body motion, facial expressions).
Achieves state-of-the-art performance on SuSuInterActs (R@1 43.64%) and BEATv2 benchmarks.
Enables real-time generation: 6 seconds of motion produced in 0.3 seconds, supporting multi-turn streaming.
Utilizes a pre-trained motion foundation model for rich action priors.

Maintenance & Community

Developed by the SentiPulse Team with acknowledgments to contributors. Model weights are distributed via HuggingFace. No explicit community channels or public roadmaps are detailed in the README.

Licensing & Compatibility

Released under the SentiPulse Non-Commercial Source License v1.0. This license permits sharing, adaptation, and building for non-commercial purposes. Commercial use is strictly prohibited; contact authors for licensing.

Limitations & Caveats

The non-commercial license restricts adoption for any commercial application. Generalization may be influenced by training on the specific SuSuInterActs dataset. Installation and inference may require familiarity with Conda, vLLM, and CUDA.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

253 stars in the last 20 days