nemotron-january-2026  by pipecat-ai

Voice agent framework with NVIDIA open models

Created 4 months ago
557 stars

Top 57.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

This repository provides sample code for building voice agents using NVIDIA's open-source Nemotron Speech ASR, Nemotron 3 Nano LLM, and Magpie TTS (preview) models. It targets engineers and researchers seeking to deploy advanced voice AI capabilities, offering flexible deployment options on high-end NVIDIA hardware locally or via cloud platforms like Modal and Pipecat Cloud. The project enables rapid prototyping and deployment of sophisticated, real-time voice interaction systems.

How It Works

The system integrates NVIDIA's Nemotron Speech ASR, Nemotron 3 Nano LLM, and Magpie TTS. It supports two primary LLM backends: llama.cpp (optimized for single GPUs with GGUF quantized models) and vLLM (for multi-GPU or cloud deployments with BF16 models). The architecture emphasizes low-latency voice-to-voice interaction through components like a buffered LLM service for 100% KV cache reuse and adaptive streaming TTS.

Quick Start & Requirements

  • Local Setup: Requires Docker, CUDA 13.1, and high-end NVIDIA hardware (DGX Spark/RTX 5090). Container build takes 2-3 hours. Commands: docker build -f Dockerfile.unified -t nemotron-unified:cuda13 ., ./scripts/nemotron.sh start, uv run pipecat_bots/bot_interleaved_streaming.py. Access at http://localhost:7860/client.
  • Cloud Setup (Modal/Pipecat Cloud): Requires respective cloud accounts. Install dependencies (uv sync --extra modal --extra bot), authenticate (modal setup or pipecat cloud auth login), deploy services (modal deploy ... or pipecat cloud deploy ...).
  • Hardware: Significant VRAM needed, e.g., ~72GB for BF16 LLM with vLLM.
  • Links: Local demo at http://localhost:7860/client.

Highlighted Details

  • Leverages NVIDIA's Nemotron models for ASR, LLM, and Magpie TTS (preview).
  • Offers optimized bot implementations for single-GPU latency and multi-GPU cloud deployment.
  • Supports multiple transport protocols: WebRTC, Daily, Twilio.
  • Features a unified container build compiling PyTorch, NeMo, vLLM, and llama.cpp from source.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), sponsorships, or roadmap are provided in the README.

Licensing & Compatibility

The repository's README does not specify a software license. This omission requires clarification for assessing commercial use or derivative works.

Limitations & Caveats

  • Local deployment demands substantial NVIDIA hardware (DGX Spark/RTX 5090) with high VRAM (~72GB for BF16 LLM).
  • Initial container build is time-intensive (2-3 hours) due to source compilation.
  • vLLM service startup can take 10-15 minutes.
  • Magpie TTS is in preview.
  • License information is absent, posing a potential adoption blocker.
Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

ultravox by fixie-ai

0.1%
4k
Multimodal LLM for real-time voice interactions
Created 2 years ago
Updated 5 months ago
Feedback? Help us improve.