S.A.T.U.R.D.A.Y  by GRVYDEV

Vocal computing toolbox for building voice interfaces to LLMs

created 2 years ago
701 stars

Top 49.6% on sourcepulse

GitHubView on GitHub
Project Summary

Project S.A.T.U.R.D.A.Y is a modular toolbox for vocal computing, enabling users to build self-hosted, AI-powered voice assistants akin to J.A.R.V.I.S. It targets developers and enthusiasts interested in creating sophisticated voice interfaces for LLMs, offering flexibility to integrate various AI models.

How It Works

The project employs a tool-based architecture, abstracting specific functionalities within "tools." Each tool comprises an "Engine" for domain-specific logic (e.g., voice activity detection) and a "Backend" for AI inference, allowing easy swapping of underlying models. The core tools are Speech-to-Text (STT), Text-to-Text (TTT), and Text-to-Speech (TTS), forming a pipeline for vocal interaction.

Quick Start & Requirements

  • Install/Run: make rtc, make tts, make client from the project root.
  • Prerequisites: Golang, Python, Make, C Compiler, pkg-config, opus, mecab, espeak. Tested on M1 Pro/Max.
  • Setup: Requires running three processes (RTC server, TTS server, Client) in a specific order. Initial TTS server setup involves pip install -r requirements.txt.
  • Docs: Getting Started

Highlighted Details

  • Integrates Pion for WebRTC, whisper.cpp for STT, and Coqui TTS for TTS.
  • Modular design allows for decoupled AI model upgrades.
  • Roadmap includes local TTT inference (e.g., llama.cpp) and improved ease of use.
  • Demo provides a functional J.A.R.V.I.S.-like assistant.

Maintenance & Community

  • Active community engagement via Discord.
  • Open to contributions and feature requests.
  • Community: Discord

Licensing & Compatibility

  • License: MIT
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is primarily tested on M1 Pro/Max hardware and may require significant processing power. The README notes potential bugs and installation issues, encouraging users to report them. The order of starting server processes is critical for the demo.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

promptable by cfortuner

0%
2k
TS/JS library for building full-stack AI apps
created 2 years ago
updated 2 years ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

MiniCPM-o by OpenBMB

0.2%
20k
MLLM for vision, speech, and multimodal live streaming on your phone
created 1 year ago
updated 1 month ago
Feedback? Help us improve.