Discover and explore top open-source AI tools and projects—updated daily.
TheStageAIOptimized speech-to-text inference for streaming and on-device use
New!
Top 82.4% on SourcePulse
This repository provides optimized Whisper models for efficient speech-to-text inference, focusing on streaming and on-device deployment. It targets developers needing self-hosting, cloud hosting, or edge solutions for real-time captioning and voice interfaces, offering low-latency, low-power, and scalable transcription. The project delivers high performance on NVIDIA GPUs and Apple Silicon through specialized inference engines.
How It Works
The project offers fine-tuned Whisper models supporting flexible chunk sizes (10s, 15s, 20s, 30s), overcoming the original models' fixed 30s limit. It leverages high-performance inference engines for NVIDIA GPUs (TheStage AI ElasticModels), claiming up to 220 tokens/sec on L40s for whisper-large-v3. For macOS and Apple Silicon, it provides CoreML engines optimized for minimal power consumption (~2W) and RAM usage (~2GB). Streaming inference is implemented for both NVIDIA and macOS platforms, enabling real-time transcription capabilities.
Quick Start & Requirements
Clone the repository and cd TheWhisper. Install platform-specific packages: pip install .[apple] or pip install .[nvidia]. For TheStage AI optimized NVIDIA engines, additionally install thestage-elastic-models[nvidia] (requires pip install thestage and thestage config set --api-token <YOUR_API_TOKEN>). flash_attn==2.8.2 is a required dependency for Nvidia.
https://github.com/user-attachments/assets/f4d3fe7b-e2c5-42ff-a5d0-fef6afd11684. Placeholder for React frontend example.Highlighted Details
Maintenance & Community
The README does not provide links to community channels (e.g., Discord, Slack) or a public roadmap. Acknowledgements are made to Silero VAD, OpenAI Whisper, Hugging Face Transformers, and the MLX community. No specific contributors, sponsorships, or partnerships are highlighted.
Licensing & Compatibility
The Pytorch HF Transformers (NVIDIA) and CoreML (macOS) engines are provided free of charge. TheStage AI optimized NVIDIA engines are free for small organizations (≤ 4 GPUs/year). Commercial use of TheStage AI optimized NVIDIA engines for larger deployments requires contacting TheStage AI for a service request and explicit licensing.
Limitations & Caveats
Streaming inference is reportedly not supported for whisper-large-v3-turbo on NVIDIA platforms. Word timestamp generation is unavailable for whisper-large-v3 on NVIDIA. The provided link for the React frontend example is a placeholder, and a direct download link for "TheNotes for macOS" is not present. The optimized NVIDIA engines require API token configuration and may necessitate a commercial license for extensive use.
2 days ago
Inactive
Vaibhavs10
KoljaB
neonbjb