LiveTalking  by lipku

Real-time interactive streaming digital humans

Created 2 years ago
7,372 stars

Top 7.0% on SourcePulse

GitHubView on GitHub
Project Summary

Real-time interactive streaming digital human, achieving commercial-grade effects for synchronized audio and video conversations with customizable avatars. It targets engineers and power users seeking to integrate advanced digital human technology into applications, offering flexible output options and multi-concurrency support.

How It Works

The system employs a modular, layered architecture. An API layer accepts text or audio input, managed by a session manager. The logic layer handles LLM-based chat responses, text-to-speech synthesis (supporting multiple engines), and audio feature extraction. The rendering layer utilizes deep learning models like Wav2Lip and MuseTalk for precise lip-sync generation. Finally, a streaming layer outputs real-time video and audio via WebRTC, RTMP, or virtual camera. A plugin system facilitates easy integration of new TTS, avatar, and output modules.

Quick Start & Requirements

Installation is tested on Ubuntu 24.04 with Python 3.10, PyTorch 2.5.0, and CUDA 12.4. Dependencies are managed via Conda and pip (requirements.txt). Users must download specific model weights (e.g., wav2lip.pth) and avatar assets. The core command is python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1. Server ports TCP 8010 and UDP 1-65536 must be open. An alternative Docker image is available for simplified deployment. Access to Hugging Face models can be configured via export HF_ENDPOINT=https://hf-mirror.com. Official documentation is available at livetalking-doc.readthedocs.io.

Highlighted Details

  • Supports multiple digital human models: ernerf, musetalk, wav2lip, Ultralight-Digital-Human.
  • Features voice cloning and allows for interruptions during speech.
  • Offers flexible streaming outputs: WebRTC, RTMP, and virtual camera.
  • Modular plugin system allows developers to extend TTS, avatar, and output capabilities.

Maintenance & Community

The project is actively developed, with community channels including Discord (https://discord.gg/n5jSPCT3Uf) and Telegram (https://t.me/livetalking). Commercial cooperation is welcomed.

Licensing & Compatibility

No standard open-source license is explicitly stated. However, the project requires published videos derived from it to include a "LiveTalking watermark and logo." This attribution requirement may impact commercial use and integration into closed-source products without further clarification or a commercial license.

Limitations & Caveats

Real-time performance is hardware-dependent; Wav2Lip requires a GPU like a 3060, while MuseTalk needs a 3080Ti or higher for acceptable frame rates (>25 FPS). CPU performance is critical for video encoding. Advanced features such as interruption handling, real-time subtitles, and multi-person interaction are detailed in a separate "Commercial Version" section, suggesting they are not available in the open-source release. The lack of a clear, permissive license is a significant adoption barrier.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
6
Star History
172 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.