onju-voice by justLV

Hackable AI home assistant platform using Google Nest Mini form factor

Created 2 years ago

1,601 stars

Top 26.0% on SourcePulse

Project Summary

This project provides a hackable AI home assistant platform, designed to replicate the form factor of a Google Nest Mini using a custom ESP32-S3 based PCB. It targets makers and developers interested in building custom voice assistants with local LLM capabilities, offering a flexible alternative to commercial smart speakers.

How It Works

The platform consists of a custom PCB with an ESP32-S3 microcontroller and a companion server. The ESP32-S3 handles audio capture and basic processing, while the server manages transcription (via local Whisper), LLM-based response generation (e.g., OpenAI), and Text-to-Speech (e.g., ElevenLabs). Audio data is streamed between the device and server using UDP and TCP.

Quick Start & Requirements

Server: pip install -r requirements.txt within the server directory. Requires Python, Whisper, OpenAI API key, and ElevenLabs API key. Configuration via config.yaml.
Firmware: Arduino IDE with ESP32 boards support. Requires Adafruit NeoPixel Library. WiFi credentials in credentials.h.
Hardware: Custom PCB (design files provided) or a breadboard setup with ESP32-S3 devboard, microphone, amplifier, speaker, and LED strip.
Home Assistant: Docker Compose instructions provided.
Maubot: Requires separate Maubot setup.
Demo: https://github.com/justLV/onju-voice/blob/main/docs/demo.md

Highlighted Details

Drop-in replacement PCB for Google Nest Mini (2nd gen).
Local Whisper for transcription and OpenAI/local LLMs for response generation.
Integrations with Home Assistant and Maubot (for messaging).
ESP32-S3 firmware programmable via Arduino IDE.
Server code runs on macOS, Linux, or Windows.

Maintenance & Community

The project is explicitly stated as "not being actively maintained," but all source code and design files are released for others to continue development.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is experimental and not a full replacement for commercial assistants. It lacks advanced features like Voice Activity Detection (VAD), Acoustic Echo Cancellation (AEC), and Blind Source Separation (BSS) on the device, as these are not fully supported by the Arduino IDE for ESP32. Conversation flow is serialized, and streaming responses are not implemented.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days