Conversational speech models for on-device use
Top 53.7% on sourcepulse
Vui provides small, on-device conversational speech models for researchers and developers. It enables local execution of speech interaction, reducing reliance on cloud services and offering potential for real-time applications.
How It Works
Vui is a Llama-based transformer that predicts audio tokens. It utilizes Fluac, an audio tokenizer derived from Descript Audio Codec, which quantizes audio at 21.53Hz (a 4x reduction from 86Hz). This approach aims to create efficient, on-device models capable of contextual speech generation.
Quick Start & Requirements
pip install -e .
(Linux/Windows with uv
)python demo.py
for Gradio interface.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model is known to hallucinate, and performance is noted as being constrained by limited resources. Voice Activity Detection (VAD) is used to remove silence but can slow down processing.
3 days ago
Inactive