Local AI Vtuber recreation of Neuro-Sama
Top 42.7% on sourcepulse
This project recreates the AI VTuber Neuro-Sama, enabling users to run a similar system on consumer hardware using local LLMs. It targets VTubers, streamers, and AI enthusiasts looking for an interactive, voice-driven AI companion with real-time speech processing and VTuber integration. The primary benefit is a highly customizable and locally-hosted AI personality.
How It Works
The system integrates real-time Speech-to-Text (STT) and Text-to-Speech (TTS) using the KoljaB/RealtimeSTT and KoljaB/RealtimeTTS libraries, respectively. It leverages an OpenAI-compatible API endpoint for LLM interaction, allowing flexibility in model choice (e.g., Llama 3 8B Instruct via text-generation-webui). Multimodality is supported via custom servers like Neuro-LLM-Server, enabling visual input processing. State and data are managed through a shared signals object, with modular components running in separate threads for extensibility.
Quick Start & Requirements
text-generation-webui
, an LLM, and Vtube Studio.pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
, DeepSpeed (via AllTalkTTS wheels), Twitch developer account credentials, a voice reference WAV file..env
and constants.py
files, including audio device selection and API keys.Highlighted Details
text-generation-webui
or any OpenAI-compatible endpoint.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is experimental and created for educational/recreational purposes, with no guarantee against "non-vile responses." Content filtering is minimal (currently only "turkey"). Twitch bans are possible for unsafe content. Discord integration attempts were deemed unusable due to platform limitations.
6 months ago
1 day