dia  by nari-labs

TTS model for ultra-realistic dialogue generation

created 3 months ago
17,746 stars

Top 2.5% on sourcepulse

GitHubView on GitHub
Project Summary

Dia is a 1.6B parameter text-to-speech model designed for generating ultra-realistic, one-pass dialogue with fine-grained control over emotion and non-verbal cues. It targets researchers and developers seeking advanced TTS capabilities for English content, offering features like voice cloning and conditioning on audio prompts.

How It Works

Dia directly generates speech from text, allowing for conditioning on audio inputs to control tone and emotion. It supports specific non-verbal tags (e.g., (laughs), (coughs)) and uses speaker tags [S1] and [S2] to manage dialogue flow. The model leverages the Descript Audio Codec and is built for efficient inference on GPUs.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/nari-labs/dia.git
  • Run Gradio UI: Clone repo, cd dia, then uv run app.py or python -m venv .venv && source .venv/bin/activate && pip install -e . && python app.py
  • Requirements: PyTorch 2.0+, CUDA 12.6+. Initial run requires downloading Descript Audio Codec.
  • Demo: ZeroGPU Space

Highlighted Details

  • Generates realistic dialogue with controllable emotion and non-verbal sounds.
  • Supports voice cloning via audio prompts.
  • Achieves ~2.2x real-time factor on RTX 4090 with float16 precision and torch.compile.
  • Requires ~10GB VRAM for float16 or bfloat16 inference.

Maintenance & Community

  • Active development by a small team.
  • Community support via Discord Server.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Permissive for commercial use and closed-source linking.

Limitations & Caveats

The model currently only supports English. Input text length should be moderate (under 5s to 20s of audio) to avoid unnatural speech. Overusing or misusing non-verbal tags may cause artifacts. Speaker consistency requires audio prompts or fixing the seed, as default runs produce varied voices. CPU support is planned but not yet available.

Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
7
Star History
3,421 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.