ollama-voice  by maudoin

Tool for local voice assistant using speech-to-text and LLM

created 1 year ago
339 stars

Top 82.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an offline voice interface for local LLM interaction, combining Whisper for speech-to-text, Ollama for LLM processing, and pyttsx3 for text-to-speech. It's designed for users who want to interact with local AI models via voice without relying on cloud services.

How It Works

The system orchestrates three core components: Whisper for local, offline speech recognition, Ollama for running local LLM models, and pyttsx3 for offline text-to-speech synthesis. The workflow involves capturing audio via a key press, transcribing it with Whisper, sending the text to a locally running Ollama instance, and then converting the LLM's response back into speech using pyttsx3.

Quick Start & Requirements

  • Install CUDA before running pip install .
  • Install Ollama and ensure the server is running locally.
  • Download a Whisper model (e.g., large-v3.pt) and place it in the whisper subfolder.
  • Configure assistant.yaml.
  • Run assistant.py.

Highlighted Details

  • Offline operation for all components.
  • Supports GPU acceleration for Whisper.
  • Configurable via assistant.yaml (defaults to French and Mistral model).

Maintenance & Community

No specific community links or notable contributors are mentioned in the README.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

The project is described as a "simple combination" and lists "Rearrange code base" and "Multi threading" as to-do items, suggesting it is in an early stage of development. GPU setup for Whisper is a prerequisite.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.