Discover and explore top open-source AI tools and projects—updated daily.
Ikaros-521Realtime STT/TTS pipeline for cross-network, real-time conversations
Top 69.8% on SourcePulse
This project provides a real-time speech-to-text (STT) system designed for voice assistants and applications requiring fast, low-latency transcription. It integrates with LLM services like OpenAI and ZhipuAI, and TTS engines such as GPT-SOVITS and Edge-TTS, enabling cross-network real-time conversational experiences via a web interface.
How It Works
The system utilizes a multi-component architecture for robust voice processing. Voice Activity Detection (VAD) is handled by WebRTCVAD for initial detection and SileroVAD for verification. Speech-to-text transcription is powered by Faster-Whisper, optimized for GPU acceleration. Wake word detection is implemented using Porcupine. The project also supports streaming LLM and TTS integrations for conversational AI.
Quick Start & Requirements
pip install RealtimeSTTpip install torch==2.0.1+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118).ffmpeg (installable via package managers or direct download).python webui.py.python RealtimeSTT_server2.py and access via index.html.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
OPENAI_API_KEY).10 months ago
Inactive
KoljaB