Discover and explore top open-source AI tools and projects—updated daily.
antirezPure C speech-to-text inference engine for Mistral Voxtral Realtime 4B
New!
Top 69.8% on SourcePulse
Pure C inference for Mistral AI's Voxtral Realtime 4B speech-to-text model. It targets developers needing a lightweight, dependency-free ASR solution for embedded systems or performance-critical applications, enabling real-time streaming transcription without Python or heavy ML frameworks.
How It Works
The core is a C implementation of the Voxtral 4B pipeline, relying solely on the C standard library for MPS (Apple Silicon GPU) acceleration, or OpenBLAS for other platforms. It employs a chunked audio encoder with overlapping windows and a rolling KV cache to manage memory efficiently for unlimited audio input lengths. A streaming C API (vox_stream_t) facilitates incremental audio feeding and token string retrieval, supporting direct piping from tools like ffmpeg.
Quick Start & Requirements
make mps (Apple Silicon) or make blas (Linux/Intel Mac with OpenBLAS)../download_model.sh (~8.9GB)../voxtral -d voxtral-model -i audio.wav or pipe audio via ffmpeg.pip install torch safetensors soundfile soxr for understanding the model logic.Highlighted Details
vox_stream_t for incremental audio input and token output, suitable for real-time applications.ffmpeg to stdin.Maintenance & Community
No specific details on maintainers, community channels, or roadmap were found in the provided README.
Licensing & Compatibility
The model weights are licensed under Apache-2.0. The C code itself is provided under the MIT License, which is permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The project is explicitly stated to require "more testing" and may not be "production quality." Further work is needed, particularly for stress-testing with very long transcriptions to validate KV cache handling.
2 days ago
Inactive
Vaibhavs10
KoljaB