AudioLDM2  by haoheliu

CLI tool for text-conditional audio/music generation

Created 2 years ago
2,498 stars

Top 18.7% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

AudioLDM 2 is a diffusion model for generating audio, including music, sound effects, and speech, from text prompts. It also supports super-resolution inpainting for audio. This project is suitable for researchers and developers working on generative audio models or those needing to synthesize audio content programmatically.

How It Works

AudioLDM 2 leverages latent diffusion models to generate audio. It employs a self-supervised pre-training approach, enabling it to learn holistic audio representations. The model architecture is optimized for high-fidelity audio generation, with specific checkpoints available for different tasks like music, sound effects, and text-to-speech.

Quick Start & Requirements

Highlighted Details

  • Supports text-to-audio, text-to-music, and text-to-speech generation.
  • Offers high-fidelity audio generation with a 48kHz model checkpoint.
  • Integrates with Hugging Face Diffusers library, offering up to 3x faster inference and arbitrary audio length generation.
  • Includes command-line interface for direct usage and a Gradio web app.

Maintenance & Community

The project is actively maintained by haoheliu and contributors. Further details on community channels are not explicitly listed in the README.

Licensing & Compatibility

The project does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README mentions that model performance can vary across hardware, suggesting users may need to adjust random seeds. Some features, like style transfer and inpainting for the 48kHz model, are noted as pending or welcomed as community contributions.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
19 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.1%
3k
Audio generation research paper using latent diffusion
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.