Discover and explore top open-source AI tools and projects—updated daily.
nari-labsStreaming dialogue TTS for real-time conversational audio
New!
Top 51.4% on SourcePulse
Dia2 is a streaming dialogue Text-to-Speech (TTS) model designed for real-time conversational audio generation. It addresses the need for low-latency TTS that can begin producing audio as input text is received, enabling more natural and interactive dialogue systems. The model is beneficial for researchers and developers building real-time conversational AI, virtual assistants, and speech-to-speech applications.
How It Works
The core approach is a streaming dialogue TTS architecture that processes text incrementally. This allows audio generation to commence immediately upon receiving the initial words, rather than waiting for the complete utterance. A key feature is its ability to condition output on audio inputs, such as speaker voice samples or previous conversational turns, facilitating more natural and contextually relevant speech generation for dynamic interactions.
Quick Start & Requirements
uv first, then run dependencies with uv sync. Commands are executed via uv run ....Highlighted Details
nari-labs/Dia2-1B, nari-labs/Dia2-2B).--cuda-graph for performance.Maintenance & Community
Questions can be directed to the project's Discord server, and issues can be opened on the repository. Compute for training was provided by the TPU Research Cloud program.
Licensing & Compatibility
The project is licensed under Apache 2.0. Third-party assets retain their original licenses. Apache 2.0 is generally permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
Generation is limited to a maximum of 2 minutes per call. Output quality and voice consistency can vary without prefix conditioning or fine-tuning. The project strictly forbids identity misuse, deceptive content generation, and illegal or malicious use. Transcription of prefix audio files using Whisper adds latency to conditional generation.
2 days ago
Inactive
2noise