talk by yacineMTB

Local conversational engine demo with audio

Created 2 years ago

591 stars

Top 55.0% on SourcePulse

View on GitHub

4 Experts Love This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Matt Schrage

Cofounder of Fig

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Teknium

Cofounder of Nous Research

Project Summary

This project aims to create a local, conversational AI engine for interacting with computers, targeting users comfortable with technical tinkering. It enables voice-based interaction with large language models and text-to-speech systems, offering a foundation for custom HCI applications.

How It Works

The engine utilizes an event-based architecture, integrating components like Whisper.cpp for speech-to-text and Llama.cpp for language processing, along with a TTS engine (Piper). It processes audio input, converts it to text, feeds it to the LLM for response generation, and then synthesizes the response back into speech. This local-first approach prioritizes privacy and offline functionality.

Quick Start & Requirements

Install: chmod 775 build.sh && ./build.sh (experimental) or manual installation via npm install and compiling submodules.
Prerequisites: Node.js v14.15+, Piper TTS engine (added to PATH), Graphviz (optional), CUDA (for GPU acceleration), SoX (for mic input).
Setup: Requires manual compilation of C++ submodules (Whisper.cpp, Llama.cpp) and downloading LLM weights. Estimated setup time is significant due to manual steps and potential debugging.
Links: Whisper.cpp, Llama.cpp, Piper TTS

Highlighted Details

Runs completely locally for enhanced privacy.
Supports GPU acceleration via CUBLAS for Whisper.cpp.
Event-based architecture for modularity.
Includes optional Graphviz for visualizing the event graph.

Maintenance & Community

The project was last updated in June 2023. The README indicates a focus on improving setup and usability, suggesting ongoing development. There are no explicit links to community channels or roadmaps provided.

Licensing & Compatibility

The README does not explicitly state a license. The project integrates Whisper.cpp and Llama.cpp, which have their own licenses (typically MIT-like). Compatibility for commercial use would require verifying the licenses of all integrated components.

Limitations & Caveats

The project is explicitly stated to be in an early stage with a complex and non-straightforward setup process, requiring significant "elbow grease." The intended audience is explicitly those comfortable with "hacking things together."

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days