ZerolanLiveRobot by AkagawaTsurunaki

AI VTuber robot for live streaming and interactive experiences

Created 2 years ago

564 stars

Top 57.0% on SourcePulse

Project Summary

Zerolan Live Robot offers a comprehensive, open-source AI VTuber framework designed for live streaming, chat interaction, and gaming, notably Minecraft. It targets developers and hobbyists seeking a customizable virtual companion, integrating advanced AI capabilities like LLM, ASR, TTS, OCR, and CV, all manageable with consumer-grade hardware. The project aims to democratize the creation of interactive AI personas for content creation and entertainment.

How It Works

The project employs a modular architecture, comprising ZerolanLiveRobot as the control framework, ZerolanCore for AI model services (LLM, ASR, TTS), ZerolanData for data formats, ZerolanPlayground for AR/Unity visualization, and KonekoMinecraftBot for game integration. Communication is facilitated via an event-driven design using TypedEventEmitter, allowing components to react to various inputs like voice, danmaku, screen content, and game states. AI inference is handled through HTTP-based pipelines connecting to Zerolan Core services or third-party APIs. This approach enables flexible integration of diverse AI models and real-time responsiveness.

Quick Start & Requirements

Installation:
1. Create and activate a Conda environment: conda create --name ZerolanLiveRobot python=3.11 followed by conda activate ZerolanLiveRobot.
2. Install dependencies: pip install -r requirements.txt.
3. Run the main program: python main.py.
Prerequisites: Python 3.11, Conda, a consumer-grade GPU, and a configured LLM service (either self-hosted via ZerolanCore or third-party API). OBS WebSocket server is required for streaming integration.
Configuration: Use the WebUI launched via python webui.py (accessible at http://127.0.0.1:7860) or manually edit the ./resources/config.yaml file.
Links: Developer updates on Bilibili: 赤川鹤鸣_Channel. Specific deployment guides for ZerolanCore and Zerolan Protocol are referenced but lack direct URLs in the README.

Highlighted Details

Multi-modal Interaction: Integrates LLM for dialogue, ASR for voice commands, TTS for emotional speech, OCR for screen text analysis, and CV for image recognition.
Game Integration: Features a voice-controllable AI agent for Minecraft via KonekoMinecraftBot.
Avatar & Streaming: Supports Live2D and 3D avatars with synchronized lip-sync, blinking, and breathing. Includes OBS integration for typewriter-style subtitle display.
Memory System: Implements both short-term (record count based) and long-term (vector database) memory for context retention.
Extensibility: Event-driven design and HTTP pipelines allow for custom AI model integration and service development.

Maintenance & Community

The project is under continuous development (version 2.1). Updates and progress are shared on the developer's Bilibili channel. Users can seek help by creating issues on the repository. No dedicated community channels (like Discord or Slack) are listed.

Licensing & Compatibility

The project is released under the MIT License. A standard clause prohibits illegal use of the software. The MIT license generally permits commercial use and integration into closed-source projects, subject to the license terms.

Limitations & Caveats

Integrations for YouTube and Twitch are experimental. The browser control service is basic, limited to Firefox, and requires manual extension for advanced functionality. The qqbot service is still under development. Version 2.x is incompatible with 1.x, necessitating environment and configuration resets. Developers must avoid blocking the main thread in event listeners to maintain responsiveness.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

82 stars in the last 30 days