GMTalker  by feima09

Digital human system for Unreal Engine 5.3

Created 2 months ago
596 stars

Top 54.8% on SourcePulse

GitHubView on GitHub
Project Summary

GMTalker is an immersive digital human system designed for Unreal Engine 5.3, targeting applications in scientific research, education, and virtual human development. It integrates speech recognition, synthesis, natural language understanding, lip-sync animation, and 3D rendering, offering a complete commercial digital human pipeline with local deployment capabilities.

How It Works

GMTalker employs a modular architecture with a UE5 client, a backend AI Digital Human system, and core AI services. The backend handles application logic, interacting with GPT, TTS, ASR, and Player services. These services leverage external APIs and models like OpenAI, FunASR, GPT-SoVITS, and Audio2Face. The system supports real-time interaction, including voice interruption and RAG for personalized Q&A, with a focus on natural language processing and animation synchronization.

Quick Start & Requirements

  • Installation: Clone the repository and use provided batch or PowerShell scripts (webui.bat or ./webui.ps1) for a one-click start.
  • Requirements: Python 3.11+, Windows 10/11, 8GB+ RAM, Unreal Engine 5.3.2, Conda (recommended), NVIDIA GPU with CUDA support (3090 or higher recommended).
  • Access: Main service at http://127.0.0.1:5002, Web UI at http://127.0.0.1:7860.
  • Documentation: Full installation guide available at install.md, WebUI guide at webui.md.

Highlighted Details

  • Supports voice input, real-time interruption, and context-aware conversations.
  • Features realistic lip-sync and emotional facial expressions driven by voice.
  • Offers local deployment and UE5 rendering for high-fidelity visuals.
  • Includes RAG capabilities for domain-specific Q&A.

Maintenance & Community

The project is developed by the Media Intelligence Team of Light Intelligence Lab. Contact information for project collaboration is provided via email (mafei@gml.ac.cn, xuhongbo@gml.ac.cn) and the Guangming Laboratory Official Site.

Licensing & Compatibility

Licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This permits use, modification, and sharing for non-commercial purposes with attribution. Commercial use is restricted.

Limitations & Caveats

Audio2Face requires downloading character models via VPN and may have slow initial loading; version 2023.1.1 is recommended. The license restricts commercial use.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
393 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.