sag  by steipete

Modern TTS CLI inspired by macOS say

Created 5 months ago
300 stars

Top 88.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Like the macOS say command, sag is a command-line Text-to-Speech (TTS) tool that leverages ElevenLabs' advanced voice synthesis capabilities. It targets developers and power users seeking programmatic control over high-quality speech generation, offering features like streaming playback, file output, voice discovery, and fine-grained parameter tuning for expressive and natural-sounding speech.

How It Works

The tool integrates directly with the ElevenLabs API, enabling users to generate speech from text. It defaults to streaming audio output to speakers but can also save audio to various file formats. sag supports ElevenLabs' diverse voice models, including v3 for expressive "acting" styles using audio tags, and v2/v2.5 for more stable, SSML-compatible speech with lower latency options. Users can control synthesis parameters such as stability, similarity, speed, and normalization, allowing for tailored voice output.

Quick Start & Requirements

Installation is available via Homebrew on macOS (brew install steipete/tap/sag) or directly using the Go toolchain (go install ./cmd/sag), requiring Go 1.24+. A mandatory ElevenLabs API key must be configured via the ELEVENLABS_API_KEY environment variable or a file path.

Highlighted Details

  • Mimics macOS say command behavior for seamless integration.
  • Features voice discovery (sag voices) and supports fine-grained control over synthesis parameters like stability, similarity, and speed.
  • Offers flexibility across ElevenLabs models (v3, v2.5 Flash/Turbo, v2), each with distinct prompting styles and input limits.
  • Supports streaming playback, file output (MP3, WAV), and various latency tiers for optimized performance.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap were found in the provided README.

Licensing & Compatibility

The license type is not explicitly stated in the README. The tool is primarily designed for macOS but offers basic playback functionality on other platforms, though device selection flags are non-operational outside macOS.

Limitations & Caveats

An ElevenLabs account and API key are strictly required. The v3 model does not support SSML <break> tags. Input text length is limited by the chosen ElevenLabs engine (ranging from 5,000 to 40,000 characters). The --normalize flag may not be available for v2.5 Turbo/Flash models.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
59 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

Orpheus-TTS by canopyai

0.1%
6k
Open-source TTS for human-sounding speech, built on Llama-3b
Created 1 year ago
Updated 5 months ago
Feedback? Help us improve.