open-xiaoai-bridge  by coderzc

Connect smart speakers to advanced AI services

Created 3 months ago
270 stars

Top 95.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Open-XiaoAI Bridge provides a server application to integrate external AI services (OpenAI-compatible, OpenClaw, XiaoZhi AI) with Xiaomi's XiaoAI smart speakers, breaking their closed ecosystem. It offers enhanced functionality via a remote HTTP API, targeting users and developers seeking to customize speaker capabilities.

How It Works

A Python server communicates with a Rust client on the speaker via WebSocket. It processes audio streams using Voice Activity Detection (VAD) and Keyword Spotting (KWS) for efficiency. Audio is then routed to configured AI backends, supporting local ASR (SherpaASR) or XiaoAI's native ASR. The system features a modular design for enabling specific integrations.

Quick Start & Requirements

  • Prerequisites: Requires flashing XiaoAI speaker firmware (SSH enabled) and installing the Rust client program. Local ASR/TTS models may need to be downloaded.
  • Installation: Docker Compose is recommended: download config.py and docker-compose.yml, configure them, and run docker compose up -d. Local compilation involves cloning the repository, installing dependencies (uv, Rust), and running ./scripts/start.sh.
  • Configuration: Managed via config.py and environment variables to enable/disable services, set API endpoints, authentication tokens, and AI backend parameters.
  • Links: Demo ①, Demo ②, Quick Start, API Docs, FAQ.

Highlighted Details

  • OpenAI Compatibility: Seamless integration with services like OpenAI, Ollama, and LM Studio via the /v1/chat/completions endpoint.
  • OpenClaw Integration: Advanced features include custom wake words, multi-agent routing, continuous conversation, voice cloning (via Doubao TTS), and streaming playback.
  • Multi-Agent Routing: Enables distinct AI personalities activated by unique wake words on a single speaker.
  • Continuous Conversation: Supports multi-turn dialogues without repeated wake-ups, with interruption capability.
  • HTTP API: Provides remote control for text/audio playback and device management.
  • Modular Design: Features like XiaoZhi AI, OpenClaw, OpenAI compatibility, and HTTP API can be independently enabled.

Maintenance & Community

The project is maintained by coderzc. No specific community channels (Discord, Slack) or sponsorship details are provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license permits commercial use and integration into closed-source projects. However, adoption requires modifying the XiaoAI speaker's firmware.

Limitations & Caveats

  • Firmware Modification: Essential for setup, posing a significant barrier and potential warranty issue.
  • External Dependencies: Relies on external AI services or locally downloaded large model files for advanced features.
  • Setup Complexity: Initial setup involves firmware flashing, client installation, and configuration, demanding technical expertise.
Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
3
Star History
41 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.