Linguflex by KoljaB

Voice-based AI companion for home automation and information retrieval

Created 2 years ago

803 stars

Top 43.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

Linguflex aims to provide a Jarvis-like AI companion experience, enabling voice-based interaction for controlling smart home devices, managing schedules, searching the web, and more. It targets users seeking an advanced, customizable AI assistant and developers interested in contributing to AI evolution. The project emphasizes local operation, low latency, and high-quality audio output through voice cloning.

How It Works

Linguflex operates locally, processing audio input via a "Listen" module, performing cognitive tasks with a "Brain" module (supporting local LLMs or OpenAI API), and generating audio output through a "Speech" module. The "Speech" module leverages advanced TTS technologies like Realtime Voice Cloning (RVC) and fine-tuned XTTS for high-quality, low-latency voice synthesis. Functionality is extended through various modules for music, email, weather, smart home control, and more, with keyword pre-parsing to optimize LLM interaction.

Quick Start & Requirements

Installation is complex due to intricate integrations and dependency management. A detailed "Modules Guide" is provided for setup.
Requires Python, with specific versions and CUDA/CuDNN compatibility being critical.
Local LLM inference and advanced TTS features may necessitate significant computational resources, including GPUs.
See in action (short clip): https://github.com/KoljaB/Linguflex#see-in-action-short-clip
Installation video guide: https://github.com/KoljaB/Linguflex#installation-video-guide

Highlighted Details

Ollama support is now integrated.
Achieves ultra-low latency for language model communication and TTS generation.
Offers near-ElevenLabs quality local TTS synthesis via RVC and XTTS.
Features a modular design for extensibility, with planned modules for vision, memory, news, finance, and image generation.

Maintenance & Community

The project is a personal passion project actively seeking community contributions and insights. Philip Ehrbright is credited for developing the Ollama support feature.

Licensing & Compatibility

The codebase is MIT licensed. However, TTS model weights (CoquiEngine, ElevenlabsEngine, AzureEngine) have restrictions: they are open-source only for non-commercial projects, with commercial use requiring paid plans or specific tiers. OpenAI engine usage is subject to OpenAI's terms.

Limitations & Caveats

The installation process is noted as challenging and potentially unstable due to complex integrations and Python's dependency management. Commercial use of the core TTS voice generation capabilities is restricted by the underlying model licenses, requiring separate paid plans for many components.

Health Check

Last Commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days