Open-source AI voice assistant for natural, empathic conversations
Top 63.8% on sourcepulse
BUD-E is an open-source, fully local AI voice assistant designed for natural, real-time conversations with emotional intelligence and long-term memory. It targets users seeking an advanced, private voice assistant that can handle multi-speaker interactions and interruptions, running on consumer hardware.
How It Works
BUD-E integrates NVIDIA's FastConformer streaming STT, Microsoft's Phi-2 LLM, and StyleTTS2 for TTS. It aims for low-latency responses by fine-tuning STT and TTS models with LLM context, and plans to implement speculative decoding and end-of-speech detection for further speed improvements. The system is designed to manage conversational context and potentially incorporate multi-modal inputs and tool use.
Quick Start & Requirements
git clone --recurse-submodules
, create a conda environment with Python 3.10.12, install espeak-ng
, PyTorch, and pip install -r requirements.txt
.python main.py
.espeak-ng
, PyTorch. Ubuntu users may need portaudio19-dev
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The current version is a demo with ongoing development; many roadmap features are not yet implemented. Multi-speaker support is basic, and reliable speaker diarization is a planned improvement. WhisperSpeech TTS is noted as very slow on Windows due to torch.compile
incompatibility.
1 year ago
1 day