bailing  by wwbin2017

Voice dialogue robot similar to GPT-4o, achieved via ASR+LLM+TTS

created 11 months ago
1,357 stars

Top 30.2% on sourcepulse

GitHubView on GitHub
Project Summary

百聆 (Bailing) is an open-source voice dialogue assistant designed for natural, low-latency conversations, mimicking GPT-4o's capabilities. It targets users seeking a high-quality, accessible AI assistant that can run on low-resource environments, including Macs without GPUs, offering features like interruption handling and tool integration.

How It Works

Bailing integrates Automatic Speech Recognition (ASR) via FunASR, Voice Activity Detection (VAD) using silero-vad, Large Language Models (LLM) powered by DeepSeek, and Text-to-Speech (TTS) with edge-tts Kokoro-82M. This modular architecture allows for independent upgrades of components. The "Robot" framework manages tasks, memory, and user interruptions, ensuring seamless coordination between modules for a fluid interaction.

Quick Start & Requirements

  • Install: Clone the repository, install dependencies with pip install -r requirements.txt and pip install -r third_party/OpenManus/requirements.txt.
  • Prerequisites: Python 3.11+, pip, DeepSeek API key (or other LLM provider keys), download SenseVoiceSmall to models/SenseVoiceSmall.
  • Configuration: Edit config/config.yaml for ASR, LLM, and other settings.
  • Run: Execute python main.py after setting up the backend service if needed.
  • Docs: https://github.com/wwbin2017/bailing

Highlighted Details

  • Achieves end-to-end latency as low as 800ms.
  • Operates without a GPU, making it suitable for low-configuration devices.
  • Supports interruption handling and intelligent task management.
  • Integrates tool-calling capabilities for practical applications.
  • Features memory and personalization for a tailored user experience.

Maintenance & Community

The project acknowledges contributions from DeepSeek, FunASR, Silero-VAD, ChatTTS, and OpenManus. It encourages community contributions via GitHub Issues and Pull Requests.

Licensing & Compatibility

The project is licensed under the MIT License, allowing for free use, modification, and distribution, provided the original license notice is retained. However, a disclaimer states the project is for personal learning and research, not commercial use or production environments, and users assume all risks.

Limitations & Caveats

The project's disclaimer explicitly states it is for personal learning and research only and not for commercial use or production environments. The developers disclaim responsibility for any data loss, system failures, or other issues arising from its use, and no technical support or guarantees are provided.

Health Check
Last commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
2
Star History
187 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Toran Bruce Richards Toran Bruce Richards(Founder of AutoGPT), and
2 more.

OS-Copilot by OS-Copilot

0.1%
2k
OS agent for automating daily tasks
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Andre Zayarni Andre Zayarni(Cofounder of Qdrant), and
2 more.

RealChar by Shaunwei

0.1%
6k
Real-time AI character/companion creation and interaction codebase
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
1 more.

leon by leon-ai

0.2%
17k
Open-source personal assistant to self-host
created 6 years ago
updated 3 days ago
Feedback? Help us improve.