Ultralight-Digital-Human by anliyuan

Digital human model for mobile, real-time use

Created 1 year ago

2,379 stars

Top 19.0% on SourcePulse

Project Summary

This repository provides an ultralight digital human model capable of real-time performance on mobile devices. It addresses the need for lightweight, efficient digital human generation, targeting developers and researchers looking to integrate or build upon such technology. The primary benefit is enabling realistic digital human animation driven by audio on resource-constrained platforms.

How It Works

The model leverages audio feature extraction using either wenet or HuBERT. wenet is noted for its speed and suitability for real-time mobile deployment, while HuBERT offers superior quality but is slower. The process involves preprocessing video data, extracting audio features, training a syncnet for improved synchronization, and finally training the digital human model. Inference can be performed using the extracted audio features and trained checkpoints.

Quick Start & Requirements

Installation: Primarily uses conda for environment setup and pip for package installation.
Prerequisites: PyTorch (tested with 1.13.1), CUDA 11.7, MKL, OpenCV, Transformers, NumPy (1.23.5), Soundfile, Librosa, ONNX Runtime.
Data Preprocessing: Requires video data (3-5 min) with clear audio and full face visibility. Frame rate must match the chosen audio extractor (20fps for wenet, 25fps for HuBERT).
Links:
- wenet encoder ONNX: https://drive.google.com/file/d/1e4Z9zS053JEWl6Mj3W9Lbc9GDtzHIg6b/view?usp=drive_link
- wenet code: https://github.com/Tzenthin/wenet_mnn

Highlighted Details

Claims to be the first open-source ultralight digital human model for real-time mobile execution.
Offers a choice between wenet (faster, mobile-friendly) and HuBERT (better quality) for audio feature extraction.
Supports training a syncnet for enhanced results.
Mentions ongoing work to release streaming inference code for both Python and C++ (for mobile deployment).

Maintenance & Community

The project has garnered significant attention (1607 stars at the time of writing). The author plans major code refactoring and the release of streaming inference capabilities. Community interaction is encouraged via issues and PRs.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

The author notes that code style and stability may not be optimal due to the project's rapid growth. Performance is highly dependent on audio quality; poor audio (noise, echo, unclear vocals) significantly degrades results. Streaming inference code is not yet fully released.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

34 stars in the last 30 days