AudioLDM  by haoheliu

Audio generation research paper using latent diffusion

Created 2 years ago
2,740 stars

Top 17.4% on SourcePulse

GitHubView on GitHub
Project Summary

AudioLDM is a latent diffusion model for generating audio from text prompts, enabling speech, sound effects, and music creation. It also supports audio-to-audio generation and text-guided style transfer, targeting researchers and developers in audio synthesis and AI music.

How It Works

AudioLDM leverages a latent diffusion model architecture, similar to Stable Diffusion for images. It encodes audio into a lower-dimensional latent space, performs diffusion in this latent space conditioned on text embeddings, and then decodes the latent representation back into audio. This approach allows for efficient generation of high-fidelity audio by operating in a compressed latent space.

Quick Start & Requirements

  • Install via pip: pip3 install git+https://github.com/haoheliu/AudioLDM.git
  • Requires Python 3.8, GPU with 8GB VRAM, 64-bit OS.
  • Official Hugging Face Diffusers integration available: pip install --upgrade diffusers transformers
  • Web demo available via Gradio: python3 app.py
  • Documentation and examples: Hugging Face Hub

Highlighted Details

  • Supports text-to-audio, audio-to-audio, and text-guided audio style transfer.
  • Offers multiple model checkpoints (e.g., audioldm-s-full-v2, audioldm-m-full) with varying performance characteristics.
  • Integrated into Hugging Face Diffusers library for easier use and experimentation.
  • Command-line interface and Gradio web application for accessibility.

Maintenance & Community

  • Active development with recent updates in April 2023.
  • Project is associated with ICML 2023.
  • Code references Stable Diffusion and CLAP.

Licensing & Compatibility

  • The README does not explicitly state a license. However, the project is shared "based on the UK copyright exception of data for academic research," suggesting potential restrictions for commercial use.

Limitations & Caveats

  • The project's data sharing basis implies it may be restricted to academic research and not suitable for commercial applications without further clarification.
  • Some advanced features like super-resolution and inpainting are listed as TODOs for the Gradio app.
Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
18 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

AudioGPT by AIGC-Audio

0.0%
10k
Audio processing and generation research project
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.