Neuro  by kimjammer

Local AI Vtuber recreation of Neuro-Sama

created 1 year ago
857 stars

Top 42.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project recreates the AI VTuber Neuro-Sama, enabling users to run a similar system on consumer hardware using local LLMs. It targets VTubers, streamers, and AI enthusiasts looking for an interactive, voice-driven AI companion with real-time speech processing and VTuber integration. The primary benefit is a highly customizable and locally-hosted AI personality.

How It Works

The system integrates real-time Speech-to-Text (STT) and Text-to-Speech (TTS) using the KoljaB/RealtimeSTT and KoljaB/RealtimeTTS libraries, respectively. It leverages an OpenAI-compatible API endpoint for LLM interaction, allowing flexibility in model choice (e.g., Llama 3 8B Instruct via text-generation-webui). Multimodality is supported via custom servers like Neuro-LLM-Server, enabling visual input processing. State and data are managed through a shared signals object, with modular components running in separate threads for extensibility.

Quick Start & Requirements

  • Install: Follow detailed instructions in the README for text-generation-webui, an LLM, and Vtube Studio.
  • Prerequisites: Nvidia GPU (12GB VRAM recommended), Python 3.11.9, PyTorch 2.2.2 with CUDA 11.8, pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118, DeepSpeed (via AllTalkTTS wheels), Twitch developer account credentials, a voice reference WAV file.
  • Setup: Requires configuring .env and constants.py files, including audio device selection and API keys.
  • Links: DEMO VIDEO, neurofrontend

Highlighted Details

  • Real-time STT and TTS for natural voice interaction.
  • Flexible LLM integration with text-generation-webui or any OpenAI-compatible endpoint.
  • Long-term memory and RAG capabilities, with automatic memory generation.
  • Multimodal support via custom servers like Neuro-LLM-Server (e.g., MiniCPM-Llama3-V-2_5-int4).
  • VTuber integration with Vtube Studio via virtual audio cables for lip-sync.

Maintenance & Community

  • Project is experimental and educational.
  • Ko-fi tips are appreciated for support.
  • Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. It mentions "see LICENSE for the repository license," but no LICENSE file is present in the provided context.
  • Attribution is appreciated in derivative works.

Limitations & Caveats

The project is experimental and created for educational/recreational purposes, with no guarantee against "non-vile responses." Content filtering is minimal (currently only "turkey"). Twitch bans are possible for unsafe content. Discord integration attempts were deemed unusable due to platform limitations.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
164 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Feedback? Help us improve.