Discover and explore top open-source AI tools and projects—updated daily.
pipecat-aiVoice agent framework with NVIDIA open models
New!
Top 71.3% on SourcePulse
Summary
This repository provides sample code for building voice agents using NVIDIA's open-source Nemotron Speech ASR, Nemotron 3 Nano LLM, and Magpie TTS (preview) models. It targets engineers and researchers seeking to deploy advanced voice AI capabilities, offering flexible deployment options on high-end NVIDIA hardware locally or via cloud platforms like Modal and Pipecat Cloud. The project enables rapid prototyping and deployment of sophisticated, real-time voice interaction systems.
How It Works
The system integrates NVIDIA's Nemotron Speech ASR, Nemotron 3 Nano LLM, and Magpie TTS. It supports two primary LLM backends: llama.cpp (optimized for single GPUs with GGUF quantized models) and vLLM (for multi-GPU or cloud deployments with BF16 models). The architecture emphasizes low-latency voice-to-voice interaction through components like a buffered LLM service for 100% KV cache reuse and adaptive streaming TTS.
Quick Start & Requirements
docker build -f Dockerfile.unified -t nemotron-unified:cuda13 ., ./scripts/nemotron.sh start, uv run pipecat_bots/bot_interleaved_streaming.py. Access at http://localhost:7860/client.uv sync --extra modal --extra bot), authenticate (modal setup or pipecat cloud auth login), deploy services (modal deploy ... or pipecat cloud deploy ...).http://localhost:7860/client.Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), sponsorships, or roadmap are provided in the README.
Licensing & Compatibility
The repository's README does not specify a software license. This omission requires clarification for assessing commercial use or derivative works.
Limitations & Caveats
4 days ago
Inactive
fixie-ai
neonbjb