Discover and explore top open-source AI tools and projects—updated daily.
fikrikarimOn-device, real-time multimodal AI for natural conversations
New!
Top 29.9% on SourcePulse
Parlor offers on-device, real-time multimodal AI, enabling natural voice and vision conversations that run entirely locally. Aimed at users seeking privacy-focused AI interactions and particularly beneficial for language learners, it eliminates server costs and makes advanced AI accessible on personal hardware, envisioning future mobile deployment.
How It Works
The system employs a browser-based frontend capturing microphone and camera input, transmitting audio (PCM) and video (JPEG) via WebSockets to a FastAPI server. This server leverages Gemma 4 E2B (via LiteRT-LM on GPU) for speech and vision understanding, and Kokoro TTS (using MLX on macOS or ONNX on Linux) for speech synthesis. Browser-side Voice Activity Detection (Silero VAD) enables hands-free operation and barge-in capabilities, while sentence-level TTS streaming ensures low-latency audio playback. This architecture provides real-time, natural interaction without relying on external servers.
Quick Start & Requirements
git clone https://github.com/fikrikarim/parlor.git
cd parlor
# Install uv if needed: curl -LsSf https://astral.sh/uv/install.sh | sh
cd src
uv sync
uv run server.py
Access the application at http://localhost:8000.Highlighted Details
Maintenance & Community
The provided README does not detail specific contributors, sponsorships, partnerships, community channels (e.g., Discord/Slack), or a roadmap.
Licensing & Compatibility
Limitations & Caveats
This project is presented as a "research preview" and an "early experiment," with users advised to expect "rough edges and bugs." While not suitable for tasks like agentic coding, it is highlighted as a valuable tool for language learning.
5 days ago
Inactive
OpenBMB