local-talking-llm  by vndee

Talking LLM for local voice assistant creation

Created 1 year ago
763 stars

Top 45.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a Python-based framework for building an offline, voice-activated AI assistant. It targets users interested in creating personal AI agents similar to Jarvis or Friday, enabling local, internet-free conversational capabilities.

How It Works

The assistant integrates three core open-source components: OpenAI's Whisper for speech-to-text, Ollama serving a Llama-2 model for natural language understanding and response generation, and Suno AI's Bark for text-to-speech synthesis. The workflow involves recording user speech, transcribing it to text, processing the text through the LLM for a response, and finally vocalizing the response using Bark. This modular approach allows for customization and leverages powerful, locally runnable models.

Quick Start & Requirements

  • Install: Requires Python environment setup (e.g., Poetry). Key libraries include openai-whisper, suno-bark, langchain, sounddevice, pyaudio, speechrecognition, and rich.
  • LLM Backend: Ollama must be installed and running, with a model like llama2 pulled (ollama pull llama2).
  • Hardware: A CUDA-enabled GPU is recommended for faster processing, as the Bark model can be resource-intensive.
  • Docs: Original article and demo video available.

Highlighted Details

  • Voice-based interaction with conversational context maintenance.
  • Utilizes suno/bark-small for text-to-speech, with potential to use larger models.
  • langchain is used for managing the conversational chain with Ollama.
  • Offers suggestions for performance optimization using .cpp implementations.

Maintenance & Community

The project is based on a blog post and tutorial, with the primary contributor being duy-huynh. Further community engagement or maintenance status is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license for the project code itself. However, it relies on libraries with their own licenses (Whisper, Bark, Langchain, Ollama), which may have implications for commercial use.

Limitations & Caveats

The application can run slowly, particularly on systems without a GPU, due to the resource demands of the Bark model. Performance optimization suggestions are provided but not implemented in the base code.

Health Check
Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
31 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

Orpheus-TTS by canopyai

0.2%
6k
Open-source TTS for human-sounding speech, built on Llama-3b
Created 10 months ago
Updated 1 month ago
Feedback? Help us improve.