Voice dialogue robot similar to GPT-4o, achieved via ASR+LLM+TTS
Top 30.2% on sourcepulse
百聆 (Bailing) is an open-source voice dialogue assistant designed for natural, low-latency conversations, mimicking GPT-4o's capabilities. It targets users seeking a high-quality, accessible AI assistant that can run on low-resource environments, including Macs without GPUs, offering features like interruption handling and tool integration.
How It Works
Bailing integrates Automatic Speech Recognition (ASR) via FunASR, Voice Activity Detection (VAD) using silero-vad, Large Language Models (LLM) powered by DeepSeek, and Text-to-Speech (TTS) with edge-tts Kokoro-82M. This modular architecture allows for independent upgrades of components. The "Robot" framework manages tasks, memory, and user interruptions, ensuring seamless coordination between modules for a fluid interaction.
Quick Start & Requirements
pip install -r requirements.txt
and pip install -r third_party/OpenManus/requirements.txt
.models/SenseVoiceSmall
.config/config.yaml
for ASR, LLM, and other settings.python main.py
after setting up the backend service if needed.Highlighted Details
Maintenance & Community
The project acknowledges contributions from DeepSeek, FunASR, Silero-VAD, ChatTTS, and OpenManus. It encourages community contributions via GitHub Issues and Pull Requests.
Licensing & Compatibility
The project is licensed under the MIT License, allowing for free use, modification, and distribution, provided the original license notice is retained. However, a disclaimer states the project is for personal learning and research, not commercial use or production environments, and users assume all risks.
Limitations & Caveats
The project's disclaimer explicitly states it is for personal learning and research only and not for commercial use or production environments. The developers disclaim responsibility for any data loss, system failures, or other issues arising from its use, and no technical support or guarantees are provided.
2 days ago
1 week