tango  by declare-lab

Diffusion model family for text-to-audio generation

Created 2 years ago
1,195 stars

Top 32.7% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Tango is a family of latent diffusion models for text-to-audio generation, capable of producing realistic sounds from textual prompts. It targets researchers and developers in audio synthesis, offering state-of-the-art performance with efficient generation.

How It Works

Tango utilizes a latent diffusion model (LDM) architecture, employing a UNet for audio generation conditioned on text embeddings from a frozen Flan-T5 LLM. This approach allows for high-quality audio synthesis with a significantly smaller training dataset compared to other state-of-the-art models. Tango 2 further refines this by incorporating Direct Preference Optimization (DPO) on a custom dataset, Audio-Alpaca, to align model outputs with human preferences.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Requires PyTorch 1.13.1+cu117.
  • libsndfile1 is needed for soundfile on Linux.
  • Official Colab notebooks are available for quick testing: Tango 2 Colab.

Highlighted Details

  • Tango 2 achieves state-of-the-art text-to-audio generation, producing 30-second audio clips in under 3 seconds.
  • Offers multiple pre-trained models on Hugging Face, including Tango, Tango-Full, Tango-2, and specialized variants.
  • Includes the Audio-Alpaca dataset for preference-based alignment.
  • Provides comprehensive training and inference scripts.

Maintenance & Community

  • Developed by declare-lab.
  • Links to demos and papers are provided for further exploration.

Licensing & Compatibility

  • The repository does not explicitly state a license. Code is borrowed from AudioLDM.

Limitations & Caveats

  • The original AudioCaps dataset used for training has some removed instances, impacting evaluation completeness.
  • Distribution of the training data is restricted due to copyright issues.
Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.1%
3k
Audio generation research paper using latent diffusion
Created 2 years ago
Updated 2 months ago
Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

AudioGPT by AIGC-Audio

0.0%
10k
Audio processing and generation research project
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.