tada by HumeAI

Generative speech modeling framework

Created 4 months ago

1,005 stars

Top 36.4% on SourcePulse

Project Summary

Summary

TADA is a unified speech-language model designed to address the computational inefficiencies and transcript hallucination common in traditional Text-to-Speech (TTS) systems. It targets researchers and developers seeking high-fidelity speech synthesis with a more natural flow and reduced computational overhead. The core benefit lies in its novel 1:1 text-acoustic alignment, enabling a more cohesive and efficient speech generation process.

How It Works

TADA utilizes a unique tokenization schema that aligns each text token with a single speech vector, creating a synchronized stream. Its dynamic autoregression allows the model to generate the entire speech segment for a text token in one step, dynamically controlling duration and prosody. This dual-stream generation approach simultaneously produces text tokens and the speech for preceding tokens, maintaining context while significantly lowering computational costs compared to fixed-frame-rate models.

Quick Start & Requirements

Installation is straightforward via pip: pip install hume-tada. Alternatively, clone the repository and install from source using pip install -e .. The project offers models like TADA-1B and TADA-3B-ML. Inference examples indicate a requirement for a CUDA-enabled GPU.

Highlighted Details

1:1 Token Alignment: Achieves precise synchronization between text tokens and corresponding speech vectors.
Dynamic Duration Synthesis: Generates speech for each text token in a single autoregressive step, adapting duration and prosody.
Dual-Stream Generation: Processes text and speech concurrently, maintaining context and improving efficiency.
Multilingual Support: Includes language-specific aligners for Arabic, Chinese, German, Spanish, French, Italian, Japanese, Polish, and Portuguese.
Speech Continuation: Allows for generating speech beyond an initial prompt.

Maintenance & Community

This project is developed by Hume AI, an "empathic AI research company." For inquiries regarding product or research collaborations, contact hello@hume.ai. The README does not provide links to community channels like Discord or Slack, nor a public roadmap.

Licensing & Compatibility

The provided README does not explicitly state the software license. This omission necessitates clarification for potential adopters, particularly concerning commercial use or integration within closed-source applications.

Limitations & Caveats

The built-in Automatic Speech Recognition (ASR) used for prompt encoding is exclusively English-only. For non-English prompts, users must supply the corresponding transcript to the encoder to ensure optimal alignment quality, as degraded results may occur otherwise.

tada by HumeAI

Explore Similar Projects

qwen-tts-webui by licyk

ComfyUI-Qwen3-TTS by DarioFT

Meta-voicebox by SpeechifyInc

DiffGAN-TTS by keonlee9420

FastDiff by Rongjiehuang

FireRedTTS by FireRedTeam

speech-synthesis-paper by wenet-e2e

glow-tts by jaywalnut310

MARS5-TTS by Camb-ai

metavoice-src by metavoiceio

VoxCPM by OpenBMB

VibeVoice by microsoft