GLaDOS  by dnhkng

AI-powered personality core for interactive, embodied assistant

created 2 years ago
4,953 stars

Top 10.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project aims to create a physical, interactive AI embodying GLaDOS from the Portal series, targeting hobbyists and developers interested in embodied AI and robotics. It offers a unique opportunity to build a sophisticated conversational agent with a physical presence, capable of low-latency voice interaction and potential future vision capabilities.

How It Works

The system employs a low-latency pipeline: continuous audio recording buffers data, detecting voice activity. Upon cessation, speech is transcribed and streamed to a local LLM. Sentence-by-sentence LLM output is fed to a text-to-speech engine, enabling concurrent generation and playback for reduced latency. The architecture prioritizes minimal dependencies for constrained hardware, avoiding large frameworks like PyTorch.

Quick Start & Requirements

  • Installation: Clone the repository, then run python scripts/install.py (or scripts\install.py on Windows).
  • Prerequisites: Ollama for LLM hosting, Python 3.12, and potentially CUDA drivers/toolkit for NVIDIA GPUs or appropriate ONNX Runtime versions for other accelerators. PortAudio library is required for Linux.
  • Running: Execute uv run glados or uv run glados tui for the Text UI.
  • Resources: Requires an LLM (e.g., llama3.2 via Ollama) and an OpenAI-compatible TTS server. Performance is highly dependent on hardware acceleration.
  • Docs: https://github.com/dnhkng/GLaDOS

Highlighted Details

  • Aims for sub-600ms response latency.
  • Supports various local LLMs (via Ollama) and TTS voices (Kokoro).
  • Experimental support for running on an 8GB SBC (RK3588 NPU).
  • Future plans include VLM integration for vision and custom vector DB for memory.

Maintenance & Community

  • Active development with community support via Discord.
  • Project sponsorship is available.

Licensing & Compatibility

  • The repository itself appears to be under an unspecified license. The README does not explicitly state a license.

Limitations & Caveats

The project is in active, experimental development, particularly the SBC implementation, and does not guarantee support for complex setup issues. Users may encounter segfaults and require significant troubleshooting, especially on non-standard hardware. Voice interruption loops can occur without proper audio hardware or configuration.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
0
Star History
261 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Feedback? Help us improve.