voice-assistant by linyiLYi

Voice assistant demo using local LLM

Created 2 years ago

1,329 stars

Top 30.0% on SourcePulse

Project Summary

This project provides a local, voice-activated assistant for dialogue with large language models. It's a toy demo targeting users interested in offline AI interaction, offering a simple Python script for voice recognition and text generation.

How It Works

The assistant leverages Apple's MLX implementation of OpenAI's Whisper for voice-to-text conversion. For text generation, it utilizes the Yi large language model (e.g., Yi-34B-Chat or Yi-6B-Chat) running locally via LangChain and llama.cpp. This approach enables offline operation and direct interaction without relying on cloud APIs.

Quick Start & Requirements

Install: conda create -n VoiceAI python=3.11, conda activate VoiceAI, pip install -r requirements.txt, CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python.
Prerequisites: Python 3.11.5, Anaconda recommended, portaudio (via brew install portaudio), pyaudio. Tested on macOS; Windows/Linux require alternative audio libraries.
Models: Download GGUF format Yi models (e.g., TheBloke/Yi-34B-Chat-GGUF, XeIaso/Yi-6B-Chat-GGUF) and place in models/. Whisper model (e.g., whisper-large-v3) also required in models/.
Resources: Yi-34B-Chat (8-bit quantized) requires ~39GB VRAM.

Highlighted Details

Local voice recognition via Whisper (Apple MLX implementation).
Local LLM inference using Yi models via llama.cpp.
Single-script architecture (main.py) for simplicity.
macOS Metal support via llama-cpp-python build flags.

Maintenance & Community

The project is a personal toy demo. No specific community channels or roadmap are indicated.

Licensing & Compatibility

The README does not explicitly state a license. The project uses components from Apple MLX and OpenAI Whisper, which have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as a "toy demo" and has only been tested on macOS, with Windows/Linux requiring manual replacement of audio components. Model file paths are hardcoded variables, and the 34B model has significant hardware requirements.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days