Ultralight-Digital-Human  by anliyuan

Digital human model for mobile, real-time use

created 9 months ago
2,129 stars

Top 21.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an ultralight digital human model capable of real-time performance on mobile devices. It addresses the need for lightweight, efficient digital human generation, targeting developers and researchers looking to integrate or build upon such technology. The primary benefit is enabling realistic digital human animation driven by audio on resource-constrained platforms.

How It Works

The model leverages audio feature extraction using either wenet or HuBERT. wenet is noted for its speed and suitability for real-time mobile deployment, while HuBERT offers superior quality but is slower. The process involves preprocessing video data, extracting audio features, training a syncnet for improved synchronization, and finally training the digital human model. Inference can be performed using the extracted audio features and trained checkpoints.

Quick Start & Requirements

Highlighted Details

  • Claims to be the first open-source ultralight digital human model for real-time mobile execution.
  • Offers a choice between wenet (faster, mobile-friendly) and HuBERT (better quality) for audio feature extraction.
  • Supports training a syncnet for enhanced results.
  • Mentions ongoing work to release streaming inference code for both Python and C++ (for mobile deployment).

Maintenance & Community

The project has garnered significant attention (1607 stars at the time of writing). The author plans major code refactoring and the release of streaming inference capabilities. Community interaction is encouraged via issues and PRs.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

The author notes that code style and stability may not be optimal due to the project's rapid growth. Performance is highly dependent on audio quality; poor audio (noise, echo, unclear vocals) significantly degrades results. Streaming inference code is not yet fully released.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
14
Star History
282 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.