This project provides an AI-powered interactive avatar engine for creating dynamic digital characters. It targets VTubers, streamers, and developers building virtual assistants, enabling real-time animation, speech, and personality through a unified pipeline.
How It Works
The engine orchestrates a multi-stage process: audio capture and speech recognition (Whisper ASR), optional screen content analysis, LLM-driven response generation based on personality definitions, text-to-speech synthesis (kokoro models with RVC optional), and Live2D animation driven by phonemes and emotion tags. Visuals are output via Spout for seamless integration with streaming software like OBS.
Quick Start & Requirements
- Install: Follow the detailed Installation and Setup Guide.
- Prerequisites: Mandatory NVIDIA GPU with CUDA support for ASR, TTS, and RVC. .NET Runtime, espeak-ng.
- Models: Requires downloading Whisper ASR models and configuring LLM access (local or cloud). Live2D model setup (includes "Aria" or custom). Optional RVC ONNX models.
- Setup: Configuration involves
appsettings.json
and potentially prompt engineering for LLMs.
- Links: Installation and Setup Guide, Live2D Integration & Rigging Guide, Discord.
Highlighted Details
- Live2D integration supports emotion tags (
[EMOTION:name]
) and VBridger-standard lip-sync.
- Advanced TTS pipeline with ONNX synthesis and espeak-ng fallback.
- Optional Real-time Voice Cloning (RVC) via ONNX models.
- Experimental features include screen awareness and an interactive roulette wheel.
- Spout output for direct streaming to OBS without window capture.
Maintenance & Community
- Active community via Discord for support, demos, and discussion.
- GitHub Issues for bug reports and feature requests.
- Contact available via Twitter/X.
Licensing & Compatibility
- The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is undetermined.
Limitations & Caveats
- A mandatory NVIDIA GPU with CUDA is required for core AI functionalities, limiting platform compatibility.
- Experimental features like screen awareness and the roulette wheel may have stability or performance issues.
- Optimal LLM interaction requires a specially fine-tuned model, though standard models are supported with careful prompt engineering.