Discover and explore top open-source AI tools and projects—updated daily.
declare-labDiffusion model family for text-to-audio generation
Top 32.3% on SourcePulse
Tango is a family of latent diffusion models for text-to-audio generation, capable of producing realistic sounds from textual prompts. It targets researchers and developers in audio synthesis, offering state-of-the-art performance with efficient generation.
How It Works
Tango utilizes a latent diffusion model (LDM) architecture, employing a UNet for audio generation conditioned on text embeddings from a frozen Flan-T5 LLM. This approach allows for high-quality audio synthesis with a significantly smaller training dataset compared to other state-of-the-art models. Tango 2 further refines this by incorporating Direct Preference Optimization (DPO) on a custom dataset, Audio-Alpaca, to align model outputs with human preferences.
Quick Start & Requirements
pip install -r requirements.txt after cloning the repository.libsndfile1 is needed for soundfile on Linux.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 months ago
1 day
haoheliu
lucidrains
open-mmlab
AIGC-Audio
facebookresearch