CyberVerse  by dsd2077

Digital human agents with real-time video interaction

Created 1 month ago
785 stars

Top 44.1% on SourcePulse

GitHubView on GitHub
Project Summary

CyberVerse provides an open-source platform for creating interactive digital human agents capable of real-time, face-to-face video conversations. It allows users to bring AI characters to life from a single photograph, offering a unique way to interact with AI that goes beyond traditional avatars. The platform is designed for users seeking advanced AI companions or developers looking to integrate lifelike AI agents into their applications.

How It Works

The system leverages WebRTC for real-time, peer-to-peer video streaming with low latency, incorporating embedded TURN/NAT traversal for connectivity. Digital humans are animated using state-of-the-art models like FlashHead and LiveAct, which generate facial expressions, lip-sync, and subtle movements from a single input photo without requiring 3D modeling or motion capture. CyberVerse features a modular, plugin-based architecture, allowing users to swap components such as Large Language Models (LLMs), Text-to-Speech (TTS) engines, Automatic Speech Recognition (ASR) models, and avatar backends via YAML configuration.

Quick Start & Requirements

  • Primary Install/Run: Clone the repository, create a Conda environment, install PyTorch with CUDA support, configure environment variables (.env), download model weights using huggingface-cli, update cyberverse_config.yaml, run make setup, and then start services with make inference, make server, and make frontend in separate terminals.
  • Prerequisites: Node.js 18+, Go 1.25+, Python 3.10+, Conda, CUDA 12.8+, FFmpeg (with libvpx), PyTorch 2.8.
  • Hardware: Significant GPU acceleration is required for real-time performance. Benchmarks indicate that high-end GPUs like RTX 5090 or RTX PRO 6000 are needed for "Pro" quality models, while an RTX 4090 can handle "Lite" models.
  • Links: Demo videos are available on YouTube: Alice, Lina, Xiaolongnü. Model weights are hosted on Hugging Face and ModelScope.

Highlighted Details

  • Real-Time Video Calls: Offers unlimited-duration, live, low-latency video calls with the first frame appearing in approximately 1.5 seconds.
  • Single Photo Avatar Creation: State-of-the-art avatar models animate a digital human from just one photo, including facial animation and lip-sync.
  • Pluggable Architecture: Components for brain (LLM), face (avatar), voice (TTS), and ears (ASR) are swappable plugins, configurable via YAML.
  • Avatar Models: Supports FlashHead (1.3B parameters, Lite/Pro versions) and LiveAct (18B parameters).

Maintenance & Community

The provided README does not detail specific community channels (like Discord/Slack), notable contributors, sponsorships, or a public roadmap beyond planned features.

Licensing & Compatibility

  • License: GNU General Public License v3.0 (GPL v3.0).
  • Compatibility: As a GPL v3.0 licensed project, derivative works must also be licensed under GPL v3.0. This copyleft nature may impose restrictions on integration with closed-source commercial applications.

Limitations & Caveats

The platform requires substantial GPU resources for real-time operation, with specific hardware benchmarks provided. Several advanced features, such as user-side camera input for gesture recognition, knowledge import for RAG, embeddability, and agent memory/tool use, are listed as planned but not yet implemented. The setup process involves multiple dependencies and configuration steps, and optional components like SageAttention and FlashAttention may require source compilation.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
11
Star History
624 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.