nemotron-january-2026 by pipecat-ai

Voice agent framework with NVIDIA open models

Created 1 month ago

532 stars

Top 59.5% on SourcePulse

1 Expert Loves This Project

hammer

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Summary

This repository provides sample code for building voice agents using NVIDIA's open-source Nemotron Speech ASR, Nemotron 3 Nano LLM, and Magpie TTS (preview) models. It targets engineers and researchers seeking to deploy advanced voice AI capabilities, offering flexible deployment options on high-end NVIDIA hardware locally or via cloud platforms like Modal and Pipecat Cloud. The project enables rapid prototyping and deployment of sophisticated, real-time voice interaction systems.

How It Works

The system integrates NVIDIA's Nemotron Speech ASR, Nemotron 3 Nano LLM, and Magpie TTS. It supports two primary LLM backends: llama.cpp (optimized for single GPUs with GGUF quantized models) and vLLM (for multi-GPU or cloud deployments with BF16 models). The architecture emphasizes low-latency voice-to-voice interaction through components like a buffered LLM service for 100% KV cache reuse and adaptive streaming TTS.

Quick Start & Requirements

Local Setup: Requires Docker, CUDA 13.1, and high-end NVIDIA hardware (DGX Spark/RTX 5090). Container build takes 2-3 hours. Commands: docker build -f Dockerfile.unified -t nemotron-unified:cuda13 ., ./scripts/nemotron.sh start, uv run pipecat_bots/bot_interleaved_streaming.py. Access at http://localhost:7860/client.
Cloud Setup (Modal/Pipecat Cloud): Requires respective cloud accounts. Install dependencies (uv sync --extra modal --extra bot), authenticate (modal setup or pipecat cloud auth login), deploy services (modal deploy ... or pipecat cloud deploy ...).
Hardware: Significant VRAM needed, e.g., ~72GB for BF16 LLM with vLLM.
Links: Local demo at http://localhost:7860/client.

Highlighted Details

Leverages NVIDIA's Nemotron models for ASR, LLM, and Magpie TTS (preview).
Offers optimized bot implementations for single-GPU latency and multi-GPU cloud deployment.
Supports multiple transport protocols: WebRTC, Daily, Twilio.
Features a unified container build compiling PyTorch, NeMo, vLLM, and llama.cpp from source.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), sponsorships, or roadmap are provided in the README.

Licensing & Compatibility

The repository's README does not specify a software license. This omission requires clarification for assessing commercial use or derivative works.

Limitations & Caveats

Local deployment demands substantial NVIDIA hardware (DGX Spark/RTX 5090) with high VRAM (~72GB for BF16 LLM).
Initial container build is time-intensive (2-3 hours) due to source compilation.
vLLM service startup can take 10-15 minutes.
Magpie TTS is in preview.
License information is absent, posing a potential adoption blocker.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

3

Issues (30d)

0

Star History

34 stars in the last 30 days

Explore Similar Projects

onnx-asr by istupakov

Lightweight ONNX-based Automatic Speech Recognition (ASR)

Created 10 months ago

Updated 2 days ago

voxtype by peteonrails

Linux voice-to-text with push-to-talk

Created 2 months ago

Updated 20 hours ago

LLMVoX by mbzuai-oryx

Autoregressive TTS model for streaming speech from any LLM

Created 11 months ago

Updated 9 months ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

S.A.T.U.R.D.A.Y by GRVYDEV

Vocal computing toolbox for building voice interfaces to LLMs

Created 2 years ago

Updated 2 years ago

Auralis by astramind-ai

TTS engine for fast voice cloning

Created 1 year ago

Updated 1 year ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier).

hyprwhspr by goodroot

Native speech-to-text for system-wide dictation

Created 6 months ago

Updated 1 day ago

willow-inference-server by toverainc

Local inference server for ASR/STT, TTS, and LLM tasks

Created 3 years ago

Updated 1 week ago

Mediapipe4u-plugin by endink

Unreal Engine plugin for integrating AI/ML techniques

Created 3 years ago

Updated 1 month ago

Starred by

Laurent Mazare

Laurent Mazare(Cofounder of Kyutai).

unmute by kyutai-labs

LLM voice and speech interface

Created 8 months ago

Updated 2 days ago

alltalk_tts by erew123

Text-to-speech tool based on Coqui TTS engine

Created 2 years ago

Updated 1 month ago

Starred by

Thomas Wolf

Thomas Wolf(Cofounder of Hugging Face),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

5 more.

ultravox by fixie-ai

Multimodal LLM for real-time voice interactions

Created 1 year ago

Updated 2 months ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs),

Travis Fischer

Travis Fischer(Founder of Agentic), and

10 more.

tortoise-tts by neonbjb

Multi-voice TTS system emphasizing quality, realistic prosody

Created 4 years ago

Updated 1 year ago

Feedback? Help us improve.