MOSS-TTS  by OpenMOSS

Open-source speech and sound generation model family

Created 2 months ago
1,164 stars

Top 32.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

MOSS-TTS Family provides an open-source suite for high-fidelity, high-expressiveness audio generation across complex scenarios, addressing single-model limitations. It targets engineers and researchers needing production-ready components for diverse needs like long-form speech, dialogue, voice design, and real-time streaming, enhancing audio content creation.

How It Works

The MOSS-TTS Family comprises five specialized models (MOSS-TTS, MOSS-TTSD, MOSS-VoiceGenerator, MOSS-TTS-Realtime, MOSS-SoundEffect) for modularity or pipeline composition. A core MOSS-Audio-Tokenizer, built on a "CNN-free" Causal Transformer, unifies audio representation, compressing 24kHz audio to 12.5Hz with high fidelity and native streaming support. This enables novel capabilities like reference-free voice design and specialized solutions for long-speech, expressive dialogue, and low-latency agents.

Quick Start & Requirements

  • Installation: Clone repo, cd MOSS-TTS, pip install --extra-index-url https://download.pytorch.org/whl/cu128 -e ..
  • Prerequisites: Python 3.12, CUDA >= 12.8, PyTorch 2.9.1+cu128, Torchaudio 2.9.1+cu128, Transformers 5.0.0. FlashAttention 2 optional.
  • Links: Huggingface Spaces: MOSS-TTS, MOSS-TTSD-v1.0, MOSS-VoiceGenerator.

Highlighted Details

  • MOSS-TTS: State-of-the-art on Seed-TTS-eval benchmark, rivaling closed-source systems, offering long-speech and fine-grained control.
  • MOSS-TTSD-v1.0: Industry-leading objective/subjective performance for expressive, multi-speaker dialogues, outperforming Doubao and Gemini 2.5-pro.
  • MOSS-VoiceGenerator: Excels in voice design, generating diverse voices/styles from text prompts without reference speech.
  • MOSS-Audio-Tokenizer: Compresses 24kHz audio to 12.5Hz with high fidelity (0.125-4kbps) and native streaming design.

Maintenance & Community

Recently released (Feb 2026), the README lacks contributor/community channel details. Information may be found via linked Huggingface spaces or GitHub.

Licensing & Compatibility

Licensed under Apache License 2.0, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

Optional FlashAttention 2 installation may fail on some hardware. As a new project, long-term maintenance and community adoption are TBD. Different model architectures present distinct trade-offs requiring careful selection.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
18
Issues (30d)
16
Star History
307 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.