talking-avatar-with-ai by asanchezyali

Talking avatar project using LLMs for realistic digital human interaction

Created 2 years ago

402 stars

Top 72.1% on SourcePulse

Project Summary

This project provides a framework for creating an interactive digital human that can converse and respond with realistic facial animations and lip-syncing. It targets developers and researchers looking to build AI-powered virtual agents, offering a modular approach leveraging leading AI services for natural language understanding, speech synthesis, and animation.

How It Works

The system orchestrates a pipeline of AI services to deliver a dynamic conversational experience. User input, whether text or audio, is processed by OpenAI's GPT for response generation and Whisper for speech-to-text. The generated text is then synthesized into speech using Eleven Labs, and crucially, Rhubarb Lip Sync is employed to generate viseme data from the audio. This viseme data drives the digital human's facial animations, synchronizing lip movements with the spoken words for enhanced realism.

Quick Start & Requirements

Install: Clone the repository, navigate to the directory, and run yarn.
Prerequisites: Active OpenAI and Eleven Labs subscriptions (paid version recommended), Rhubarb Lip-Sync executable placed in /apps/backend/bin, and ffmpeg installed.
Configuration: Create a .env file in /apps/backend/ with OPENAI_API_KEY, OPENAI_MODEL, ELEVEN_LABS_API_KEY, ELVEN_LABS_VOICE_ID, and ELEVEN_LABS_MODEL_ID.
Run: Execute yarn dev. Access the demo at http://localhost:5173/.

Highlighted Details

Integrates OpenAI GPT for response generation, Whisper for transcription, and Eleven Labs for voice synthesis.
Utilizes Rhubarb Lip Sync to generate viseme data for precise lip-syncing.
Supports both text and audio input for user interaction.
Defines AI persona and response structure (facial expressions, animations) via Langchain.

Maintenance & Community

A Discord channel "Math & Code" is available for configuration support.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use.

Limitations & Caveats

The free tier of Eleven Labs is noted as problematic due to request limits, recommending the paid version for stable operation.
Requires manual download and placement of the Rhubarb Lip-Sync executable.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

10 stars in the last 30 days

Explore Similar Projects

ChatAnything by zhoudaquan

AI Facetime chat with LLM-enhanced personas (research paper)

Created 2 years ago

Updated 2 years ago

Soul-of-Waifu by jofizcd

AI companion app for interacting with characters

Created 2 years ago

Updated 4 months ago

gpt-voice-conversation-chatbot by Adri6336

Voice chatbot for engaging spoken conversations with ChatGPT/GPT-4

Created 2 years ago

Updated 1 year ago

-Prototype-AIVTuber by ponlponl123

Open-source AI VTuber (deprecated)

Created 3 years ago

Updated 9 months ago

ChatGPT-Virtual-Live by smallnew666

Virtual live-streaming assistant for platforms like Bilibili

Created 2 years ago

Updated 2 years ago

Interactive-LLM-Powered-NPCs by AkshitIreddy

Game mod for interactive, LLM-powered NPC conversations

Created 2 years ago

Updated 1 year ago

Starred by

Teknium

Teknium(Cofounder of Nous Research).

ChatWaifu by cjyaddone

Chatbot for simulating conversations with waifu-style characters

Created 3 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

ChatdollKit by uezo

3D virtual assistant SDK for voice-enabled chatbots using 3D models

Created 5 years ago

Updated 1 month ago

Babagaboosh by DougDougGithub

Simple app for verbal conversation with GPT-4o

Created 2 years ago

Updated 1 year ago

local-talking-llm by vndee

Talking LLM for local voice assistant creation

Created 1 year ago

Updated 2 months ago

TalkingHead by met4citizen

JavaScript class for real-time lip-sync using 3D avatars

Created 2 years ago

Updated 4 days ago

Linly-Talker by Kedreamix

Digital avatar conversational system

Created 2 years ago

Updated 10 months ago

Feedback? Help us improve.