local-talking-llm by vndee

Talking LLM for local voice assistant creation

Created 1 year ago

763 stars

Top 45.8% on SourcePulse

Project Summary

This project provides a Python-based framework for building an offline, voice-activated AI assistant. It targets users interested in creating personal AI agents similar to Jarvis or Friday, enabling local, internet-free conversational capabilities.

How It Works

The assistant integrates three core open-source components: OpenAI's Whisper for speech-to-text, Ollama serving a Llama-2 model for natural language understanding and response generation, and Suno AI's Bark for text-to-speech synthesis. The workflow involves recording user speech, transcribing it to text, processing the text through the LLM for a response, and finally vocalizing the response using Bark. This modular approach allows for customization and leverages powerful, locally runnable models.

Quick Start & Requirements

Install: Requires Python environment setup (e.g., Poetry). Key libraries include openai-whisper, suno-bark, langchain, sounddevice, pyaudio, speechrecognition, and rich.
LLM Backend: Ollama must be installed and running, with a model like llama2 pulled (ollama pull llama2).
Hardware: A CUDA-enabled GPU is recommended for faster processing, as the Bark model can be resource-intensive.
Docs: Original article and demo video available.

Highlighted Details

Voice-based interaction with conversational context maintenance.
Utilizes suno/bark-small for text-to-speech, with potential to use larger models.
langchain is used for managing the conversational chain with Ollama.
Offers suggestions for performance optimization using .cpp implementations.

Maintenance & Community

The project is based on a blog post and tutorial, with the primary contributor being duy-huynh. Further community engagement or maintenance status is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license for the project code itself. However, it relies on libraries with their own licenses (Whisper, Bark, Langchain, Ollama), which may have implications for commercial use.

Limitations & Caveats

The application can run slowly, particularly on systems without a GPU, due to the resource demands of the Bark model. Performance optimization suggestions are provided but not implemented in the base code.

local-talking-llm by vndee

Explore Similar Projects

local_llm_assistant by nickbild

cosyvoice-api by jianchang512

echogarden by echogarden-project

LLaSA_training by zhenye234

LLaSM by LinkSoul-AI

ollama-voice by maudoin

ollama-voice-mac by apeatling

Babagaboosh by DougDougGithub

swift by ai-ng

ichigo by janhq

voice-assistant by linyiLYi

Orpheus-TTS by canopyai