voice-assistant  by linyiLYi

Voice assistant demo using local LLM

created 1 year ago
1,319 stars

Top 31.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a local, voice-activated assistant for dialogue with large language models. It's a toy demo targeting users interested in offline AI interaction, offering a simple Python script for voice recognition and text generation.

How It Works

The assistant leverages Apple's MLX implementation of OpenAI's Whisper for voice-to-text conversion. For text generation, it utilizes the Yi large language model (e.g., Yi-34B-Chat or Yi-6B-Chat) running locally via LangChain and llama.cpp. This approach enables offline operation and direct interaction without relying on cloud APIs.

Quick Start & Requirements

  • Install: conda create -n VoiceAI python=3.11, conda activate VoiceAI, pip install -r requirements.txt, CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python.
  • Prerequisites: Python 3.11.5, Anaconda recommended, portaudio (via brew install portaudio), pyaudio. Tested on macOS; Windows/Linux require alternative audio libraries.
  • Models: Download GGUF format Yi models (e.g., TheBloke/Yi-34B-Chat-GGUF, XeIaso/Yi-6B-Chat-GGUF) and place in models/. Whisper model (e.g., whisper-large-v3) also required in models/.
  • Resources: Yi-34B-Chat (8-bit quantized) requires ~39GB VRAM.

Highlighted Details

  • Local voice recognition via Whisper (Apple MLX implementation).
  • Local LLM inference using Yi models via llama.cpp.
  • Single-script architecture (main.py) for simplicity.
  • macOS Metal support via llama-cpp-python build flags.

Maintenance & Community

The project is a personal toy demo. No specific community channels or roadmap are indicated.

Licensing & Compatibility

The README does not explicitly state a license. The project uses components from Apple MLX and OpenAI Whisper, which have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as a "toy demo" and has only been tested on macOS, with Windows/Linux requiring manual replacement of audio components. Model file paths are hardcoded variables, and the 34B model has significant hardware requirements.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Feedback? Help us improve.