opentalking by datascale-ai

Real-time digital human framework for conversational AI

Created 2 months ago

2,220 stars

Top 19.7% on SourcePulse

Project Summary

OpenTalking is an open-source framework designed to orchestrate the components required for real-time digital human dialogue products. It addresses the complexity of integrating frontend interaction, conversational state, LLM responses, TTS, interruption control, and WebRTC playback, targeting developers building or experimenting with such systems. The framework offers flexibility by supporting external API calls and local model deployments, enabling rapid setup from quick demos to high-quality, private enterprise solutions.

How It Works

OpenTalking focuses on the "production orchestration layer," connecting various AI models and services. It provides multiple integration paths: a quick-start demo-avatar/wav2lip for immediate validation without separate model services, lightweight options like wav2lip/musetalk for avatar asset adaptation, the quicktalk local adapter for streaming LLM to real-time lip-sync rendering with worker caching, and a high-quality deployment path via OmniRT and FlashTalk-compatible WebSocket services for private inference. This modular approach allows users to progressively upgrade model capabilities.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python virtual environment, and installing dependencies with pip install -e ".[dev]". Key prerequisites include Python ≥ 3.9, Node.js ≥ 18, and FFmpeg; distributed modes additionally require Redis. Configuration is managed via a .env file for API keys (e.g., Aliyun Baichuan for LLM/STT). The unified demo can be started with bash scripts/start_unified.sh in one terminal and the React frontend with npm run dev -- --host 0.0.0.0 in apps/web. Official documentation is available at docs/quickstart.md.

Highlighted Details

Real-time Dialogue: Integrates LLM responses, streaming TTS, subtitle events, state events, and WebRTC playback within a single pipeline.
QuickTalk Adapter: Enables local adaptation, avatar validation, real-time rendering queues, audio-visual synchronization, and benchmarking.
FlashTalk Compatibility: Supports local or remote FlashTalk-style inference services for high-fidelity digital human rendering.
OpenAI-Compatible LLM: Connects to various endpoints including DashScope, Ollama, vLLM, and DeepSeek.
Flexible Deployment: Offers single-process demo, distributed API/Worker modes, and Docker Compose options.

Maintenance & Community

The project maintains a QQ group (1103327938) for community discussion on real-time digital humans, model deployment, and product scenarios. The roadmap indicates ongoing development for more natural dialogue, OmniRT integration, consumer GPU optimization, and enterprise-grade features.

Licensing & Compatibility

OpenTalking is released under the Apache License 2.0, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The framework currently offers basic interruption capabilities with plans for full-link cancellation upgrades. Optimization efforts are ongoing for consumer-grade GPUs (RTX 3090/4090) and enterprise NPUs (Ascend 910B). Advanced features like Agent integration, memory capabilities, and production-grade platform features are listed as "in progress" or future roadmap items.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1,066 stars in the last 30 days