ollama-voice by maudoin

Tool for local voice assistant using speech-to-text and LLM

Created 2 years ago

364 stars

Top 77.3% on SourcePulse

Project Summary

This project provides an offline voice interface for local LLM interaction, combining Whisper for speech-to-text, Ollama for LLM processing, and pyttsx3 for text-to-speech. It's designed for users who want to interact with local AI models via voice without relying on cloud services.

How It Works

The system orchestrates three core components: Whisper for local, offline speech recognition, Ollama for running local LLM models, and pyttsx3 for offline text-to-speech synthesis. The workflow involves capturing audio via a key press, transcribing it with Whisper, sending the text to a locally running Ollama instance, and then converting the LLM's response back into speech using pyttsx3.

Quick Start & Requirements

Install CUDA before running pip install .
Install Ollama and ensure the server is running locally.
Download a Whisper model (e.g., large-v3.pt) and place it in the whisper subfolder.
Configure assistant.yaml.
Run assistant.py.

Highlighted Details

Offline operation for all components.
Supports GPU acceleration for Whisper.
Configurable via assistant.yaml (defaults to French and Mistral model).

Maintenance & Community

No specific community links or notable contributors are mentioned in the README.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

The project is described as a "simple combination" and lists "Rearrange code base" and "Multi threading" as to-do items, suggesting it is in an early stage of development. GPU setup for Whisper is a prerequisite.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days