Discover and explore top open-source AI tools and projects—updated daily.
ufalReal-time speech-to-text and LLM translation
Top 62.4% on SourcePulse
SimulStreaming provides a framework for real-time, low-latency speech-to-text (ASR) and text-to-text translation, specifically designed for long-form speech. It targets researchers and power users requiring efficient, multilingual audio stream processing, offering significant speed improvements and enabling production-ready applications by adapting offline models for streaming.
How It Works
The system integrates a Whisper-based ASR component with an LLM-based translation component (currently EuroLLM). It employs novel "simultaneous policies," such as AlignAtt and LocalAgreement, to adapt powerful offline models for streaming input. These policies intelligently manage input chunking and output generation, allowing high-quality foundation models to operate with minimal performance degradation, achieving near real-time processing. The architecture supports direct transcription or a cascade of ASR followed by translation, incorporating flexible prompting and retrieval augmented generation (RAG).
Quick Start & Requirements
pip install -r requirements_whisper.txtpip install -r requirements_translate.txtct2-transformers-converter.arecord (Linux) or ffmpeg and netcat.Highlighted Details
Maintenance & Community
Developed by authors from Charles University. User feedback is actively sought via a questionnaire to guide future development and features. No specific community channels (e.g., Discord, Slack) or public roadmaps are detailed in the README.
Licensing & Compatibility
Limitations & Caveats
2 weeks ago
Inactive
janhq
openai