OpenAvatarChat  by HumanAIGC-Engineering

Interactive digital human conversation on a single PC

created 5 months ago
1,611 stars

Top 26.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a modular, interactive digital human conversation system designed to run on a single PC, targeting developers and researchers in AI and virtual reality. It offers low-latency, multimodal conversations with customizable components, enabling flexible integration of various AI models for speech, language, and avatar rendering.

How It Works

The system employs a modular architecture, allowing users to swap components for Automatic Speech Recognition (ASR), Large Language Models (LLM), Text-to-Speech (TTS), and avatar rendering. It supports both a fully local mode using models like MiniCPM-o and a hybrid mode leveraging cloud APIs for LLM and TTS. This flexibility reduces system requirements and allows for diverse conversational experiences.

Quick Start & Requirements

  • Installation: Recommended to use uv for environment management. Install dependencies via uv sync --all-packages or mode-specific installs. Run via uv run src/demo.py --config <config_file.yaml>. Docker execution is also supported via ./build_and_run.sh --config <config_file.yaml>.
  • Prerequisites: Python >=3.10, <3.12. CUDA-enabled GPU with NVIDIA driver supporting CUDA >= 12.4. Unquantized MiniCPM-o requires >20GB VRAM; int4 quantized version reduces VRAM needs. Git LFS is required for submodules.
  • Resources: Local MiniCPM-o inference can achieve ~2.2s average response delay on an i9-13900KF with RTX 4090. CPU inference can reach up to 30 FPS.
  • Links: Demo, LiteAvatarGallery, LAM

Highlighted Details

  • Low-latency (avg. 2.2s) real-time digital human conversation.
  • Supports multimodal LLMs (text, audio, video).
  • Modular design for flexible component replacement.
  • Integrates LiteAvatar for 2D avatars and LAM for ultra-realistic 3D digital humans.
  • Offers multiple pre-set configurations for different model combinations.

Maintenance & Community

  • Active development with recent releases (v0.3.0 on 2025.04.18).
  • Community contributions acknowledged, with links to deployment tutorials.
  • Project is actively maintained by HumanAIGC-Engineering.

Licensing & Compatibility

  • The repository itself appears to be under a permissive license, but specific component licenses (e.g., for models like MiniCPM-o, CosyVoice) should be reviewed individually for commercial use restrictions.

Limitations & Caveats

  • CosyVoice local TTS on Windows requires a specific Conda installation workaround due to pynini compilation issues.
  • Using video input with MiniCPM-o can significantly increase VRAM consumption, potentially leading to OOM errors on lower-spec GPUs.
  • LAM avatar generation pipeline is noted as "not ready yet."
Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
7
Issues (30d)
13
Star History
943 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

MiniCPM-o by OpenBMB

0.2%
20k
MLLM for vision, speech, and multimodal live streaming on your phone
created 1 year ago
updated 1 month ago
Feedback? Help us improve.