ChatdollKit by uezo

3D virtual assistant SDK for voice-enabled chatbots using 3D models

Created 5 years ago

1,080 stars

Top 35.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

ChatdollKit is a Unity-based SDK for creating voice-enabled 3D chatbot avatars. It targets developers and creators looking to integrate generative AI, 3D model animation, and speech technologies into interactive virtual agents for platforms like PC, mobile, VR, AR, and WebGL. The SDK aims to simplify the complex process of building expressive and responsive AI characters.

How It Works

ChatdollKit orchestrates interactions between Large Language Models (LLMs) for dialogue, Speech-to-Text (STT) for input, and Text-to-Speech (TTS) for output. It synchronizes these with 3D model animations and facial expressions, allowing AI-driven characters to respond dynamically to user input. Key features include LLM integration (ChatGPT, Gemini, Claude), various TTS/STT providers, and robust 3D model control for lip-sync, facial expressions, and animations, all managed within the Unity engine.

Quick Start & Requirements

Installation: Import ChatdollKit.unitypackage into a Unity project.
Prerequisites: Unity (non-SRP project template), Burst, UniTask (v2.5.4+), uLipSync (v3.1.0+), UniVRM (v0.127.2+), JSON.NET, and optionally Azure Speech SDK.
Setup: Requires importing dependencies, adding AIAvatarVRM prefab, configuring LLM/Speech services with API keys, and setting up animations via ModelController.
Demo: A WebGL demo is available. A YouTube video guides through setting up the demo scene with ChatGPT.
Docs: Comprehensive setup and feature documentation is provided within the README.

Highlighted Details

Supports multiple LLMs (ChatGPT, Gemini, Claude, Dify) with function calling and multimodal capabilities.
Enables dynamic 3D model expression, including lip-sync, facial expressions, and animations synchronized with speech.
Offers extensive platform compatibility (Windows, Mac, Linux, iOS, Android, WebGL, VR, AR).
Features like dynamic language switching, long-term memory integration, and wake word detection enhance user interaction.

Maintenance & Community

The project is actively maintained by uezo. Community links are not explicitly provided in the README, but the project structure suggests a focus on developer integration.

Licensing & Compatibility

The project's license is not explicitly stated in the provided README text. Compatibility for commercial use would depend on the specific license terms.

Limitations & Caveats

Unity SRP projects are not supported due to UniVRM limitations.
WebGL builds have specific requirements, including using UniTask for async/await and ChatdollMicrophone for microphone input, and do not support compressed audio formats.
Some features like microphone control in WebGL or specific STT/TTS integrations might require platform-specific adjustments.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days