tango by declare-lab

Diffusion model family for text-to-audio generation

Created 2 years ago

1,231 stars

Top 31.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jesse Clark

Cofounder of Marqo

Project Summary

Tango is a family of latent diffusion models for text-to-audio generation, capable of producing realistic sounds from textual prompts. It targets researchers and developers in audio synthesis, offering state-of-the-art performance with efficient generation.

How It Works

Tango utilizes a latent diffusion model (LDM) architecture, employing a UNet for audio generation conditioned on text embeddings from a frozen Flan-T5 LLM. This approach allows for high-quality audio synthesis with a significantly smaller training dataset compared to other state-of-the-art models. Tango 2 further refines this by incorporating Direct Preference Optimization (DPO) on a custom dataset, Audio-Alpaca, to align model outputs with human preferences.

Quick Start & Requirements

Install via pip install -r requirements.txt after cloning the repository.
Requires PyTorch 1.13.1+cu117.
libsndfile1 is needed for soundfile on Linux.
Official Colab notebooks are available for quick testing: Tango 2 Colab.

Highlighted Details

Tango 2 achieves state-of-the-art text-to-audio generation, producing 30-second audio clips in under 3 seconds.
Offers multiple pre-trained models on Hugging Face, including Tango, Tango-Full, Tango-2, and specialized variants.
Includes the Audio-Alpaca dataset for preference-based alignment.
Provides comprehensive training and inference scripts.

Maintenance & Community

Developed by declare-lab.
Links to demos and papers are provided for further exploration.

Licensing & Compatibility

The repository does not explicitly state a license. Code is borrowed from AudioLDM.

Limitations & Caveats

The original AudioCaps dataset used for training has some removed instances, impacting evaluation completeness.
Distribution of the training data is restricted due to copyright issues.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days