Discover and explore top open-source AI tools and projects—updated daily.
Lex-auAI speech-to-speech assistant enabling natural, multimodal conversations
Top 98.8% on SourcePulse
Vocalis is a sophisticated speech-to-speech AI assistant designed for natural, low-latency conversations. It targets users seeking advanced conversational AI with features like mid-speech interruption, AI-initiated follow-ups, and multi-modal capabilities (including image analysis), offering a highly responsive and customizable experience that can leverage local LLM and TTS services.
How It Works
Vocalis employs a modern React frontend and FastAPI backend architecture to deliver a responsive, low-latency conversational experience. Its core innovation lies in its "barge-in" technology, allowing users to interrupt the AI mid-speech for natural flow. It supports AI-initiated greetings and follow-ups, dynamic visual feedback, and integrates with local LLM/TTS services via OpenAI-compatible endpoints, enabling users to run powerful AI assistants entirely offline. The system utilizes Faster-Whisper for ASR, custom VAD, and streaming TTS for immediate audio playback, with optional CUDA acceleration.
Quick Start & Requirements
setup.bat (Windows) or ./setup.sh (macOS/Linux). Manual setup involves creating Python virtual environments and running pip install -r requirements.txt for the backend, and npm install for the frontend.Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were found in the provided text.
Licensing & Compatibility
Limitations & Caveats
The system's functionality is dependent on the user successfully setting up and running compatible local LLM and TTS services, which can introduce complexity for users unfamiliar with these technologies. While CUDA acceleration is supported, optimal performance may require specific GPU hardware.
6 months ago
Inactive
collabora
Beingpax