Neuro  by kimjammer

Local AI Vtuber recreation of Neuro-Sama

Created 1 year ago
1,035 stars

Top 36.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project recreates the AI VTuber Neuro-Sama, enabling users to run a similar system on consumer hardware using local LLMs. It targets VTubers, streamers, and AI enthusiasts looking for an interactive, voice-driven AI companion with real-time speech processing and VTuber integration. The primary benefit is a highly customizable and locally-hosted AI personality.

How It Works

The system integrates real-time Speech-to-Text (STT) and Text-to-Speech (TTS) using the KoljaB/RealtimeSTT and KoljaB/RealtimeTTS libraries, respectively. It leverages an OpenAI-compatible API endpoint for LLM interaction, allowing flexibility in model choice (e.g., Llama 3 8B Instruct via text-generation-webui). Multimodality is supported via custom servers like Neuro-LLM-Server, enabling visual input processing. State and data are managed through a shared signals object, with modular components running in separate threads for extensibility.

Quick Start & Requirements

  • Install: Follow detailed instructions in the README for text-generation-webui, an LLM, and Vtube Studio.
  • Prerequisites: Nvidia GPU (12GB VRAM recommended), Python 3.11.9, PyTorch 2.2.2 with CUDA 11.8, pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118, DeepSpeed (via AllTalkTTS wheels), Twitch developer account credentials, a voice reference WAV file.
  • Setup: Requires configuring .env and constants.py files, including audio device selection and API keys.
  • Links: DEMO VIDEO, neurofrontend

Highlighted Details

  • Real-time STT and TTS for natural voice interaction.
  • Flexible LLM integration with text-generation-webui or any OpenAI-compatible endpoint.
  • Long-term memory and RAG capabilities, with automatic memory generation.
  • Multimodal support via custom servers like Neuro-LLM-Server (e.g., MiniCPM-Llama3-V-2_5-int4).
  • VTuber integration with Vtube Studio via virtual audio cables for lip-sync.

Maintenance & Community

  • Project is experimental and educational.
  • Ko-fi tips are appreciated for support.
  • Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. It mentions "see LICENSE for the repository license," but no LICENSE file is present in the provided context.
  • Attribution is appreciated in derivative works.

Limitations & Caveats

The project is experimental and created for educational/recreational purposes, with no guarantee against "non-vile responses." Content filtering is minimal (currently only "turkey"). Twitch bans are possible for unsafe content. Discord integration attempts were deemed unusable due to platform limitations.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
145 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

X-LLM by phellonchen

0%
314
Multimodal LLM research paper
Created 2 years ago
Updated 2 years ago
Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

ultravox by fixie-ai

0.2%
4k
Multimodal LLM for real-time voice interactions
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.