talking-avatar-with-ai  by asanchezyali

Talking avatar project using LLMs for realistic digital human interaction

created 1 year ago
358 stars

Top 79.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a framework for creating an interactive digital human that can converse and respond with realistic facial animations and lip-syncing. It targets developers and researchers looking to build AI-powered virtual agents, offering a modular approach leveraging leading AI services for natural language understanding, speech synthesis, and animation.

How It Works

The system orchestrates a pipeline of AI services to deliver a dynamic conversational experience. User input, whether text or audio, is processed by OpenAI's GPT for response generation and Whisper for speech-to-text. The generated text is then synthesized into speech using Eleven Labs, and crucially, Rhubarb Lip Sync is employed to generate viseme data from the audio. This viseme data drives the digital human's facial animations, synchronizing lip movements with the spoken words for enhanced realism.

Quick Start & Requirements

  • Install: Clone the repository, navigate to the directory, and run yarn.
  • Prerequisites: Active OpenAI and Eleven Labs subscriptions (paid version recommended), Rhubarb Lip-Sync executable placed in /apps/backend/bin, and ffmpeg installed.
  • Configuration: Create a .env file in /apps/backend/ with OPENAI_API_KEY, OPENAI_MODEL, ELEVEN_LABS_API_KEY, ELVEN_LABS_VOICE_ID, and ELEVEN_LABS_MODEL_ID.
  • Run: Execute yarn dev. Access the demo at http://localhost:5173/.

Highlighted Details

  • Integrates OpenAI GPT for response generation, Whisper for transcription, and Eleven Labs for voice synthesis.
  • Utilizes Rhubarb Lip Sync to generate viseme data for precise lip-syncing.
  • Supports both text and audio input for user interaction.
  • Defines AI persona and response structure (facial expressions, animations) via Langchain.

Maintenance & Community

  • A Discord channel "Math & Code" is available for configuration support.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use.

Limitations & Caveats

  • The free tier of Eleven Labs is noted as problematic due to request limits, recommending the paid version for stable operation.
  • Requires manual download and placement of the Rhubarb Lip-Sync executable.
Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
57 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.