natural_voice_assistant  by LAION-AI

Open-source AI voice assistant for natural, empathic conversations

Created 1 year ago
492 stars

Top 62.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

BUD-E is an open-source, fully local AI voice assistant designed for natural, real-time conversations with emotional intelligence and long-term memory. It targets users seeking an advanced, private voice assistant that can handle multi-speaker interactions and interruptions, running on consumer hardware.

How It Works

BUD-E integrates NVIDIA's FastConformer streaming STT, Microsoft's Phi-2 LLM, and StyleTTS2 for TTS. It aims for low-latency responses by fine-tuning STT and TTS models with LLM context, and plans to implement speculative decoding and end-of-speech detection for further speed improvements. The system is designed to manage conversational context and potentially incorporate multi-modal inputs and tool use.

Quick Start & Requirements

  • Install: Clone repo with git clone --recurse-submodules, create a conda environment with Python 3.10.12, install espeak-ng, PyTorch, and pip install -r requirements.txt.
  • Run: Execute python main.py.
  • Prerequisites: NVIDIA GPU (RTX 4090 demonstrated for 300-500ms latency), Python 3.10.12, espeak-ng, PyTorch. Ubuntu users may need portaudio19-dev.
  • Resources: Requires downloading pretrained models on first run.
  • Docs: Installation guide within the README.

Highlighted Details

  • Real-time conversational AI with empathy and emotional intelligence.
  • Handles multi-speaker conversations with interruptions and thinking pauses.
  • Operates fully locally on consumer hardware.
  • Demonstrated latency of 300-500ms on an NVIDIA RTX 4090.

Maintenance & Community

  • Collaboration between LAION, ELLIS Institute Tübingen, Collabora, and Tübingen AI Center.
  • Community contributions are invited via Discord or email (bud-e@laion.ai).
  • Roadmap includes significant planned improvements in latency, naturalness, memory, and functionality.

Licensing & Compatibility

  • License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The current version is a demo with ongoing development; many roadmap features are not yet implemented. Multi-speaker support is basic, and reliable speaker diarization is a planned improvement. WhisperSpeech TTS is noted as very slow on Windows due to torch.compile incompatibility.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
2 more.

ChatTTS by 2noise

0.2%
38k
Generative speech model for daily dialogue
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.