Open-source base model for full-duplex conversational audio
Top 25.1% on sourcepulse
Hertz-dev is an open-source base model for full-duplex conversational audio, enabling real-time, two-way voice communication. It targets researchers and developers building interactive voice applications, offering a foundational model for advanced audio interaction.
How It Works
The project provides a base model for full-duplex conversational audio, allowing for simultaneous speaking and listening. It includes scripts for offline inference, a client-server architecture for live interaction, and a browser-based client using Streamlit and WebRTC for easier accessibility.
Quick Start & Requirements
pip install -r requirements.txt
(after installing PyTorch with CUDA 12.1 support: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
). For WebRTC client: pip install -r requirements_webrtc.txt
.libportaudio
../ckpt/
.Highlighted Details
Maintenance & Community
No specific community channels or contributor information is detailed in the README.
Licensing & Compatibility
The license is not specified in the provided README.
Limitations & Caveats
Inference is only confirmed to work reliably on Python 3.10 and CUDA 12.1; other versions are less tested. The client-server and WebRTC components are experimental. Remote hosting of the Streamlit client requires HTTPS and potential STUN server configuration for WebRTC connections.
6 months ago
1 week