Discover and explore top open-source AI tools and projects—updated daily.
Text-to-audio generation with diffusion models
Top 51.1% on SourcePulse
Make-An-Audio provides a PyTorch implementation of a text-to-audio generative model based on conditional diffusion probabilistic models. It allows users to generate high-fidelity audio from text prompts, targeting researchers and developers in the audio generation space. The project offers pre-trained models and a clear implementation, enabling efficient and high-quality audio synthesis.
How It Works
The model utilizes a prompt-enhanced diffusion approach, specifically a conditional diffusion probabilistic model. This method allows for the generation of high-fidelity audio efficiently by conditioning the diffusion process on text prompts. The architecture likely involves a diffusion model that learns to denoise data, guided by text embeddings, potentially using a VAE for latent space manipulation and a vocoder (like BigVGAN) for waveform synthesis.
Quick Start & Requirements
maa1_full.ckpt
, BigVGAN vocoder, CLAP weights) need to be downloaded and placed in ./useful_ckpts
.python gen_wav.py --prompt "a bird chirps" --ddim_steps 100 --duration 10 --scale 3 --n_samples 1 --save_name "results"
.Highlighted Details
Maintenance & Community
The project is associated with ICML'23 and has an arXiv preprint. It references code from CLAP and Stable Diffusion repositories. No specific community links (Discord, Slack) or active maintenance signals are provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. However, the disclaimer warns against using the technology to generate speech without consent, implying potential legal and ethical considerations. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Dataset download links are not provided due to copyright issues, requiring users to source their own audio data. The disclaimer strongly advises against unauthorized speech generation, highlighting ethical and legal risks. The project's reliance on specific checkpoint files and a potentially complex training pipeline may present adoption challenges.
1 year ago
Inactive