nemotron-january-2026  by pipecat-ai

Voice agent framework with NVIDIA open models

Created 1 week ago

New!

409 stars

Top 71.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

This repository provides sample code for building voice agents using NVIDIA's open-source Nemotron Speech ASR, Nemotron 3 Nano LLM, and Magpie TTS (preview) models. It targets engineers and researchers seeking to deploy advanced voice AI capabilities, offering flexible deployment options on high-end NVIDIA hardware locally or via cloud platforms like Modal and Pipecat Cloud. The project enables rapid prototyping and deployment of sophisticated, real-time voice interaction systems.

How It Works

The system integrates NVIDIA's Nemotron Speech ASR, Nemotron 3 Nano LLM, and Magpie TTS. It supports two primary LLM backends: llama.cpp (optimized for single GPUs with GGUF quantized models) and vLLM (for multi-GPU or cloud deployments with BF16 models). The architecture emphasizes low-latency voice-to-voice interaction through components like a buffered LLM service for 100% KV cache reuse and adaptive streaming TTS.

Quick Start & Requirements

  • Local Setup: Requires Docker, CUDA 13.1, and high-end NVIDIA hardware (DGX Spark/RTX 5090). Container build takes 2-3 hours. Commands: docker build -f Dockerfile.unified -t nemotron-unified:cuda13 ., ./scripts/nemotron.sh start, uv run pipecat_bots/bot_interleaved_streaming.py. Access at http://localhost:7860/client.
  • Cloud Setup (Modal/Pipecat Cloud): Requires respective cloud accounts. Install dependencies (uv sync --extra modal --extra bot), authenticate (modal setup or pipecat cloud auth login), deploy services (modal deploy ... or pipecat cloud deploy ...).
  • Hardware: Significant VRAM needed, e.g., ~72GB for BF16 LLM with vLLM.
  • Links: Local demo at http://localhost:7860/client.

Highlighted Details

  • Leverages NVIDIA's Nemotron models for ASR, LLM, and Magpie TTS (preview).
  • Offers optimized bot implementations for single-GPU latency and multi-GPU cloud deployment.
  • Supports multiple transport protocols: WebRTC, Daily, Twilio.
  • Features a unified container build compiling PyTorch, NeMo, vLLM, and llama.cpp from source.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), sponsorships, or roadmap are provided in the README.

Licensing & Compatibility

The repository's README does not specify a software license. This omission requires clarification for assessing commercial use or derivative works.

Limitations & Caveats

  • Local deployment demands substantial NVIDIA hardware (DGX Spark/RTX 5090) with high VRAM (~72GB for BF16 LLM).
  • Initial container build is time-intensive (2-3 hours) due to source compilation.
  • vLLM service startup can take 10-15 minutes.
  • Magpie TTS is in preview.
  • License information is absent, posing a potential adoption blocker.
Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
4
Star History
415 stars in the last 10 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

ultravox by fixie-ai

0.1%
4k
Multimodal LLM for real-time voice interactions
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.