tango  by declare-lab

Diffusion model family for text-to-audio generation

created 2 years ago
1,182 stars

Top 33.7% on sourcepulse

GitHubView on GitHub
Project Summary

Tango is a family of latent diffusion models for text-to-audio generation, capable of producing realistic sounds from textual prompts. It targets researchers and developers in audio synthesis, offering state-of-the-art performance with efficient generation.

How It Works

Tango utilizes a latent diffusion model (LDM) architecture, employing a UNet for audio generation conditioned on text embeddings from a frozen Flan-T5 LLM. This approach allows for high-quality audio synthesis with a significantly smaller training dataset compared to other state-of-the-art models. Tango 2 further refines this by incorporating Direct Preference Optimization (DPO) on a custom dataset, Audio-Alpaca, to align model outputs with human preferences.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Requires PyTorch 1.13.1+cu117.
  • libsndfile1 is needed for soundfile on Linux.
  • Official Colab notebooks are available for quick testing: Tango 2 Colab.

Highlighted Details

  • Tango 2 achieves state-of-the-art text-to-audio generation, producing 30-second audio clips in under 3 seconds.
  • Offers multiple pre-trained models on Hugging Face, including Tango, Tango-Full, Tango-2, and specialized variants.
  • Includes the Audio-Alpaca dataset for preference-based alignment.
  • Provides comprehensive training and inference scripts.

Maintenance & Community

  • Developed by declare-lab.
  • Links to demos and papers are provided for further exploration.

Licensing & Compatibility

  • The repository does not explicitly state a license. Code is borrowed from AudioLDM.

Limitations & Caveats

  • The original AudioCaps dataset used for training has some removed instances, impacting evaluation completeness.
  • Distribution of the training data is restricted due to copyright issues.
Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.