OpenAvatarChat  by HumanAIGC-Engineering

Interactive digital human conversation on a single PC

Created 7 months ago
2,466 stars

Top 18.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a modular, interactive digital human conversation system designed to run on a single PC, targeting developers and researchers in AI and virtual reality. It offers low-latency, multimodal conversations with customizable components, enabling flexible integration of various AI models for speech, language, and avatar rendering.

How It Works

The system employs a modular architecture, allowing users to swap components for Automatic Speech Recognition (ASR), Large Language Models (LLM), Text-to-Speech (TTS), and avatar rendering. It supports both a fully local mode using models like MiniCPM-o and a hybrid mode leveraging cloud APIs for LLM and TTS. This flexibility reduces system requirements and allows for diverse conversational experiences.

Quick Start & Requirements

  • Installation: Recommended to use uv for environment management. Install dependencies via uv sync --all-packages or mode-specific installs. Run via uv run src/demo.py --config <config_file.yaml>. Docker execution is also supported via ./build_and_run.sh --config <config_file.yaml>.
  • Prerequisites: Python >=3.10, <3.12. CUDA-enabled GPU with NVIDIA driver supporting CUDA >= 12.4. Unquantized MiniCPM-o requires >20GB VRAM; int4 quantized version reduces VRAM needs. Git LFS is required for submodules.
  • Resources: Local MiniCPM-o inference can achieve ~2.2s average response delay on an i9-13900KF with RTX 4090. CPU inference can reach up to 30 FPS.
  • Links: Demo, LiteAvatarGallery, LAM

Highlighted Details

  • Low-latency (avg. 2.2s) real-time digital human conversation.
  • Supports multimodal LLMs (text, audio, video).
  • Modular design for flexible component replacement.
  • Integrates LiteAvatar for 2D avatars and LAM for ultra-realistic 3D digital humans.
  • Offers multiple pre-set configurations for different model combinations.

Maintenance & Community

  • Active development with recent releases (v0.3.0 on 2025.04.18).
  • Community contributions acknowledged, with links to deployment tutorials.
  • Project is actively maintained by HumanAIGC-Engineering.

Licensing & Compatibility

  • The repository itself appears to be under a permissive license, but specific component licenses (e.g., for models like MiniCPM-o, CosyVoice) should be reviewed individually for commercial use restrictions.

Limitations & Caveats

  • CosyVoice local TTS on Windows requires a specific Conda installation workaround due to pynini compilation issues.
  • Using video input with MiniCPM-o can significantly increase VRAM consumption, potentially leading to OOM errors on lower-spec GPUs.
  • LAM avatar generation pipeline is noted as "not ready yet."
Health Check
Last Commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)
16
Issues (30d)
34
Star History
544 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.