natural_voice_assistant by LAION-AI

Open-source AI voice assistant for natural, empathic conversations

Created 1 year ago

495 stars

Top 62.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jaret Burkett

Founder of Ostris

Project Summary

BUD-E is an open-source, fully local AI voice assistant designed for natural, real-time conversations with emotional intelligence and long-term memory. It targets users seeking an advanced, private voice assistant that can handle multi-speaker interactions and interruptions, running on consumer hardware.

How It Works

BUD-E integrates NVIDIA's FastConformer streaming STT, Microsoft's Phi-2 LLM, and StyleTTS2 for TTS. It aims for low-latency responses by fine-tuning STT and TTS models with LLM context, and plans to implement speculative decoding and end-of-speech detection for further speed improvements. The system is designed to manage conversational context and potentially incorporate multi-modal inputs and tool use.

Quick Start & Requirements

Install: Clone repo with git clone --recurse-submodules, create a conda environment with Python 3.10.12, install espeak-ng, PyTorch, and pip install -r requirements.txt.
Run: Execute python main.py.
Prerequisites: NVIDIA GPU (RTX 4090 demonstrated for 300-500ms latency), Python 3.10.12, espeak-ng, PyTorch. Ubuntu users may need portaudio19-dev.
Resources: Requires downloading pretrained models on first run.
Docs: Installation guide within the README.

Highlighted Details

Real-time conversational AI with empathy and emotional intelligence.
Handles multi-speaker conversations with interruptions and thinking pauses.
Operates fully locally on consumer hardware.
Demonstrated latency of 300-500ms on an NVIDIA RTX 4090.

Maintenance & Community

Collaboration between LAION, ELLIS Institute Tübingen, Collabora, and Tübingen AI Center.
Community contributions are invited via Discord or email (bud-e@laion.ai).
Roadmap includes significant planned improvements in latency, naturalness, memory, and functionality.

Licensing & Compatibility

License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The current version is a demo with ongoing development; many roadmap features are not yet implemented. Multi-speaker support is basic, and reliable speaker diarization is a planned improvement. WhisperSpeech TTS is noted as very slow on Windows due to torch.compile incompatibility.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days