bailing by wwbin2017

Voice dialogue robot similar to GPT-4o, achieved via ASR+LLM+TTS

Created 1 year ago

1,609 stars

Top 25.7% on SourcePulse

Project Summary

百聆 (Bailing) is an open-source voice dialogue assistant designed for natural, low-latency conversations, mimicking GPT-4o's capabilities. It targets users seeking a high-quality, accessible AI assistant that can run on low-resource environments, including Macs without GPUs, offering features like interruption handling and tool integration.

How It Works

Bailing integrates Automatic Speech Recognition (ASR) via FunASR, Voice Activity Detection (VAD) using silero-vad, Large Language Models (LLM) powered by DeepSeek, and Text-to-Speech (TTS) with edge-tts Kokoro-82M. This modular architecture allows for independent upgrades of components. The "Robot" framework manages tasks, memory, and user interruptions, ensuring seamless coordination between modules for a fluid interaction.

Quick Start & Requirements

Install: Clone the repository, install dependencies with pip install -r requirements.txt and pip install -r third_party/OpenManus/requirements.txt.
Prerequisites: Python 3.11+, pip, DeepSeek API key (or other LLM provider keys), download SenseVoiceSmall to models/SenseVoiceSmall.
Configuration: Edit config/config.yaml for ASR, LLM, and other settings.
Run: Execute python main.py after setting up the backend service if needed.
Docs: https://github.com/wwbin2017/bailing

Highlighted Details

Achieves end-to-end latency as low as 800ms.
Operates without a GPU, making it suitable for low-configuration devices.
Supports interruption handling and intelligent task management.
Integrates tool-calling capabilities for practical applications.
Features memory and personalization for a tailored user experience.

Maintenance & Community

The project acknowledges contributions from DeepSeek, FunASR, Silero-VAD, ChatTTS, and OpenManus. It encourages community contributions via GitHub Issues and Pull Requests.

Licensing & Compatibility

The project is licensed under the MIT License, allowing for free use, modification, and distribution, provided the original license notice is retained. However, a disclaimer states the project is for personal learning and research, not commercial use or production environments, and users assume all risks.

Limitations & Caveats

The project's disclaimer explicitly states it is for personal learning and research only and not for commercial use or production environments. The developers disclaim responsibility for any data loss, system failures, or other issues arising from its use, and no technical support or guarantees are provided.

Health Check

Last Commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days