AudioLDM by haoheliu

Audio generation research paper using latent diffusion

Created 2 years ago

2,802 stars

Top 16.8% on SourcePulse

2 Experts Love This Project

patrickvonplaten

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

osanseviero

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

AudioLDM is a latent diffusion model for generating audio from text prompts, enabling speech, sound effects, and music creation. It also supports audio-to-audio generation and text-guided style transfer, targeting researchers and developers in audio synthesis and AI music.

How It Works

AudioLDM leverages a latent diffusion model architecture, similar to Stable Diffusion for images. It encodes audio into a lower-dimensional latent space, performs diffusion in this latent space conditioned on text embeddings, and then decodes the latent representation back into audio. This approach allows for efficient generation of high-fidelity audio by operating in a compressed latent space.

Quick Start & Requirements

Install via pip: pip3 install git+https://github.com/haoheliu/AudioLDM.git
Requires Python 3.8, GPU with 8GB VRAM, 64-bit OS.
Official Hugging Face Diffusers integration available: pip install --upgrade diffusers transformers
Web demo available via Gradio: python3 app.py
Documentation and examples: Hugging Face Hub

Highlighted Details

Supports text-to-audio, audio-to-audio, and text-guided audio style transfer.
Offers multiple model checkpoints (e.g., audioldm-s-full-v2, audioldm-m-full) with varying performance characteristics.
Integrated into Hugging Face Diffusers library for easier use and experimentation.
Command-line interface and Gradio web application for accessibility.

Maintenance & Community

Active development with recent updates in April 2023.
Project is associated with ICML 2023.
Code references Stable Diffusion and CLAP.

Licensing & Compatibility

The README does not explicitly state a license. However, the project is shared "based on the UK copyright exception of data for academic research," suggesting potential restrictions for commercial use.

Limitations & Caveats

The project's data sharing basis implies it may be restricted to academic research and not suitable for commercial applications without further clarification.
Some advanced features like super-resolution and inpainting are listed as TODOs for the Gradio app.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

1

Star History

19 stars in the last 30 days

Explore Similar Projects

AudioStory by TencentARC

Generate long-form narrative audio using LLMs

Created 5 months ago

Updated 3 months ago

SongGen by LiuZH-19

Text-to-song generation with an auto-regressive transformer

Created 11 months ago

Updated 2 months ago

WavJourney by Audio-AGI

Audio creation pipeline using LLMs for compositional generation

Created 2 years ago

Updated 2 years ago

WavCraft by JinhuaLiang

AI agent for audio creation and editing

Created 1 year ago

Updated 11 months ago

Make-An-Audio by Text-to-Audio

Text-to-audio generation with diffusion models

Created 2 years ago

Updated 1 year ago

Starred by

Jesse Clark

Jesse Clark(Cofounder of Marqo).

tango by declare-lab

Diffusion model family for text-to-audio generation

Created 2 years ago

Updated 5 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

FunMusic by FunAudioLLM

Toolkit for music, song, and audio generation

Created 1 year ago

Updated 7 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM2 by haoheliu

CLI tool for text-conditional audio/music generation

Created 2 years ago

Updated 1 year ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

audiolm-pytorch by lucidrains

PyTorch implementation of Google's AudioLM for audio generation

Created 3 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

2 more.

AudioGPT by AIGC-Audio

Audio processing and generation research project

Created 2 years ago

Updated 1 year ago

Starred by

Jason Huggins

Jason Huggins(Creator of Selenium),

Travis Fischer

Travis Fischer(Founder of Agentic), and

1 more.

VibeVoice by microsoft

Frontier Text-to-Speech for long conversations

Created 4 months ago

Updated 3 weeks ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Jiayi Pan

Jiayi Pan(Author of SWE-Gym; MTS at xAI), and

18 more.

audiocraft by facebookresearch

PyTorch library for audio processing and generation research

Created 2 years ago

Updated 10 months ago

Feedback? Help us improve.