MOSS-TTS-Nano by OpenMOSS

Tiny, multilingual TTS for real-time, CPU-friendly deployment

Created 3 months ago

3,891 stars

Top 12.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Li Jiang

Coauthor of AutoGen; Engineer at Microsoft

Project Summary

Summary MOSS-TTS-Nano is an open-source, multilingual, tiny speech generation model (0.1B parameters) engineered for real-time applications. It prioritizes low latency, CPU-only inference, and a simplified deployment stack, targeting local demos, web serving, and lightweight product integration.

How It Works The core architecture utilizes a pure autoregressive pipeline, integrating MOSS-Audio-Tokenizer-Nano with a lightweight LLM. This design emphasizes a minimal footprint and low latency, enabling streaming inference directly on CPU without GPU dependency. The tokenizer, based on a CNN-free causal Transformer, achieves high-fidelity audio reconstruction by compressing audio into an efficient token stream.

Quick Start & Requirements Installation is recommended within a Python 3.12 Conda environment. Post-cloning, install dependencies via pip install -r requirements.txt and the project in editable mode (pip install -e .) to enable the moss-tts-nano CLI. Manual installation of pynini=2.1.6.post1 may be required for WeTextProcessing.

Online Demo: https://openmoss.github.io/MOSS-TTS-Nano-Demo/
Hugging Face Space: OpenMOSS-Team/MOSS-TTS-Nano (Note: Direct browsing of this URL failed, but it is listed as a demo space)

Highlighted Details

Model Size: Compact 0.1 billion parameters, suitable for CPU inference.
Multilingual: Supports 20 languages, including Chinese, English, Japanese, Korean, Spanish, French, German, and more.
Audio Output: Native 48 kHz, 2-channel audio with high fidelity, compressed via a 12.5 Hz token stream.
Real-time Capability: Streaming inference achieves low latency, operational on a 4-core CPU.
Voice Cloning: Integrated voice cloning workflow via infer.py and CLI, requiring only a short reference clip.
Deployment Flexibility: Supports direct Python scripts, a local FastAPI web demo, and a packaged CLI.

Maintenance & Community The README does not specify community channels (e.g., Discord, Slack) or list notable contributors or sponsorships.

Licensing & Compatibility The project intends to follow a root LICENSE file. However, until its publication, the repository is to be treated as "not yet licensed for redistribution," potentially impacting commercial use or integration into closed-source projects.

Limitations & Caveats The primary limitation is the current lack of a published license, rendering the project "not yet licensed for redistribution" and posing a significant adoption barrier until licensing terms are clarified.

MOSS-TTS-Nano by OpenMOSS

Explore Similar Projects

CloneTTS by sipeter

Auralis by astramind-ai

TTS-Story by Xerophayze

Chatterbox-TTS-Extended by petermg

chatterbox-tts-api by travisvn

elevenlabs-mcp by elevenlabs

WhisperSpeech by WhisperSpeech

alltalk_tts by erew123

mlx-audio by Blaizzy

pocket-tts by kyutai-labs

VITS-fast-fine-tuning by Plachtaa

OmniVoice by k2-fsa