Talking avatar project using LLMs for realistic digital human interaction
Top 79.2% on sourcepulse
This project provides a framework for creating an interactive digital human that can converse and respond with realistic facial animations and lip-syncing. It targets developers and researchers looking to build AI-powered virtual agents, offering a modular approach leveraging leading AI services for natural language understanding, speech synthesis, and animation.
How It Works
The system orchestrates a pipeline of AI services to deliver a dynamic conversational experience. User input, whether text or audio, is processed by OpenAI's GPT for response generation and Whisper for speech-to-text. The generated text is then synthesized into speech using Eleven Labs, and crucially, Rhubarb Lip Sync is employed to generate viseme data from the audio. This viseme data drives the digital human's facial animations, synchronizing lip movements with the spoken words for enhanced realism.
Quick Start & Requirements
yarn
./apps/backend/bin
, and ffmpeg
installed..env
file in /apps/backend/
with OPENAI_API_KEY
, OPENAI_MODEL
, ELEVEN_LABS_API_KEY
, ELVEN_LABS_VOICE_ID
, and ELEVEN_LABS_MODEL_ID
.yarn dev
. Access the demo at http://localhost:5173/
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
11 months ago
Inactive