Discover and explore top open-source AI tools and projects—updated daily.
OpenMOSSTiny, multilingual TTS for real-time, CPU-friendly deployment
Top 14.7% on SourcePulse
Summary MOSS-TTS-Nano is an open-source, multilingual, tiny speech generation model (0.1B parameters) engineered for real-time applications. It prioritizes low latency, CPU-only inference, and a simplified deployment stack, targeting local demos, web serving, and lightweight product integration.
How It Works The core architecture utilizes a pure autoregressive pipeline, integrating MOSS-Audio-Tokenizer-Nano with a lightweight LLM. This design emphasizes a minimal footprint and low latency, enabling streaming inference directly on CPU without GPU dependency. The tokenizer, based on a CNN-free causal Transformer, achieves high-fidelity audio reconstruction by compressing audio into an efficient token stream.
Quick Start & Requirements
Installation is recommended within a Python 3.12 Conda environment. Post-cloning, install dependencies via pip install -r requirements.txt and the project in editable mode (pip install -e .) to enable the moss-tts-nano CLI. Manual installation of pynini=2.1.6.post1 may be required for WeTextProcessing.
Highlighted Details
infer.py and CLI, requiring only a short reference clip.Maintenance & Community The README does not specify community channels (e.g., Discord, Slack) or list notable contributors or sponsorships.
Licensing & Compatibility The project intends to follow a root LICENSE file. However, until its publication, the repository is to be treated as "not yet licensed for redistribution," potentially impacting commercial use or integration into closed-source projects.
Limitations & Caveats The primary limitation is the current lack of a published license, rendering the project "not yet licensed for redistribution" and posing a significant adoption barrier until licensing terms are clarified.
1 week ago
Inactive
WhisperSpeech
kyutai-labs