local-talking-llm  by vndee

Talking LLM for local voice assistant creation

created 1 year ago
530 stars

Top 60.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a Python-based framework for building an offline, voice-activated AI assistant. It targets users interested in creating personal AI agents similar to Jarvis or Friday, enabling local, internet-free conversational capabilities.

How It Works

The assistant integrates three core open-source components: OpenAI's Whisper for speech-to-text, Ollama serving a Llama-2 model for natural language understanding and response generation, and Suno AI's Bark for text-to-speech synthesis. The workflow involves recording user speech, transcribing it to text, processing the text through the LLM for a response, and finally vocalizing the response using Bark. This modular approach allows for customization and leverages powerful, locally runnable models.

Quick Start & Requirements

  • Install: Requires Python environment setup (e.g., Poetry). Key libraries include openai-whisper, suno-bark, langchain, sounddevice, pyaudio, speechrecognition, and rich.
  • LLM Backend: Ollama must be installed and running, with a model like llama2 pulled (ollama pull llama2).
  • Hardware: A CUDA-enabled GPU is recommended for faster processing, as the Bark model can be resource-intensive.
  • Docs: Original article and demo video available.

Highlighted Details

  • Voice-based interaction with conversational context maintenance.
  • Utilizes suno/bark-small for text-to-speech, with potential to use larger models.
  • langchain is used for managing the conversational chain with Ollama.
  • Offers suggestions for performance optimization using .cpp implementations.

Maintenance & Community

The project is based on a blog post and tutorial, with the primary contributor being duy-huynh. Further community engagement or maintenance status is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license for the project code itself. However, it relies on libraries with their own licenses (Whisper, Bark, Langchain, Ollama), which may have implications for commercial use.

Limitations & Caveats

The application can run slowly, particularly on systems without a GPU, due to the resource demands of the Bark model. Performance optimization suggestions are provided but not implemented in the base code.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
3
Star History
80 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
19 more.

whisper by openai

0.4%
86k
Speech recognition model for multilingual transcription/translation
created 2 years ago
updated 1 month ago
Feedback? Help us improve.