Discover and explore top open-source AI tools and projects—updated daily.
TrevorSReal-time speech recognition in Rust
New!
Top 50.7% on SourcePulse
Summary
This project provides a pure Rust implementation of Mistral's Voxtral Mini 4B Realtime model, enabling streaming speech recognition natively and directly within web browsers. It targets engineers and power users seeking efficient, client-side transcription capabilities, offering significant benefits by reducing reliance on server-side processing and enabling real-time applications across diverse platforms.
How It Works
The system leverages the Burn ML framework for its core operations, processing audio via a Mel spectrogram, followed by a causal encoder and an autoregressive decoder. It offers two inference paths: a full-precision F32 model using SafeTensors weights (~9 GB) for native execution, and a highly optimized Q4 GGUF quantized model (~2.5 GB) that runs efficiently on both native platforms and in the browser via WebAssembly (WASM) and WebGPU. The browser path utilizes custom WGSL shaders for fused dequantization and matrix multiplication, significantly reducing memory footprint and computational overhead.
Quick Start & Requirements
Native CLI installation involves downloading model weights (mistralai/Voxtral-Mini-4B-Realtime-2602) and compiling the Rust project with appropriate features (wgpu, cli, hub). Transcription can be performed on audio files using either the F32 model or the Q4 GGUF quantized version. For browser deployment, the WASM package must be built (wasm-pack), a self-signed certificate generated (openssl), and a development server started (bun serve.mjs). Access is via https://localhost:8443. Prerequisites include Rust, uv, wasm-pack, openssl, and bun. WebGPU support is mandatory for browser functionality. Model weights require ~9 GB (F32) or ~2.5 GB (Q4 GGUF).
Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap are provided within the README.
Licensing & Compatibility
The project is licensed under the Apache-2.0 license, which is permissive and generally suitable for commercial use and integration into closed-source projects.
Limitations & Caveats
Performance benchmarks (accuracy WER, inference speed) are pending. GPU-dependent tests are not run in CI due to runner limitations. Browser deployment requires manual GGUF file sharding into 512 MB chunks and generating a self-signed certificate for secure context.
1 week ago
Inactive
antirez
google