Discover and explore top open-source AI tools and projects—updated daily.
OpenBMBTokenizer-free TTS for context-aware speech and voice cloning
Top 20.4% on SourcePulse
VoxCPM is a novel tokenizer-free Text-to-Speech (TTS) system focused on context-aware speech generation and true-to-life zero-shot voice cloning. It models speech directly in a continuous space, bypassing the limitations of discrete tokenization. This system is designed for researchers and developers seeking highly expressive, natural-sounding synthetic speech with advanced voice cloning capabilities, offering enhanced realism and expressiveness through its unique architecture.
How It Works
VoxCPM utilizes an end-to-end diffusion autoregressive architecture, generating continuous speech representations directly from text. Built on the MiniCPM-4 backbone, it achieves implicit semantic-acoustic decoupling via hierarchical language modeling and Fast Sampling Quantization (FSQ) constraints. This continuous space modeling approach enhances expressiveness and generation stability, enabling more natural and contextually appropriate speech synthesis.
Quick Start & Requirements
pip install voxcpm.soundfile.python app.py.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
May produce unexpected, biased, or artifact-laden outputs. Voice cloning poses a risk of misuse for deepfakes and impersonation. Limited direct control over specific speech attributes like emotion or style. Primarily supports Chinese and English; performance on other languages is not guaranteed. Potential for instability with very long or expressive inputs.
1 month ago
Inactive
WhisperSpeech
metavoiceio