stable-audio-3  by Stability-AI

Audio generation platform for music and sound effects

Created 2 months ago
339 stars

Top 81.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Stable Audio 3 is an open platform for fast, high-quality audio and music generation, offering streamlined inference and fine-tuning. It targets researchers and power users seeking efficient tools for creating and editing audio content, benefiting from state-of-the-art models and flexible hardware support.

How It Works

This project leverages a new Semantic-Acoustic Music Encoder (SAME) autoencoder, supporting stereo, 44.1 kHz audio. It provides three core inference modes: text-to-audio, audio-to-audio editing, and inpainting/continuation. This design enables variable-length generation, efficient VRAM utilization, and personalization through stackable LoRA fine-tuning, optimizing both generative tractability and reconstruction quality.

Quick Start & Requirements

  • Primary install/run command: Install dependencies with uv sync. Run Gradio UI with uv run python run_gradio.py --model medium.
  • Non-default prerequisites: Python, uv package manager. CUDA 12.6+ is default for PyTorch; specific versions can be pinned. Flash Attention 2 is required for the medium model.
  • Links: Technical Report, Models, Extra Models, Discord, Demo, Blog Post.

Highlighted Details

  • Fast, state-of-the-art generation capable of producing minutes of audio in milliseconds.
  • Supports three distinct inference modes: text-to-audio, audio-to-audio editing, and inpainting/continuation.
  • Enables variable-length generation, optimizing inference time and VRAM usage.
  • Offers personalization via stackable LoRA fine-tuning, adaptable at runtime.
  • Broad hardware support includes CPU (Small models), CUDA/TensorRT (Medium), and Apple Silicon (CoreML).

Maintenance & Community

The project is associated with the Harmonai Discord server, which hosts discussions and weekly office hours on AI audio and music. The underfit tool by Dadabots is mentioned as an experimental option for advanced LoRA training.

Licensing & Compatibility

The project is released under the Stability AI Community License. Specific compatibility notes for commercial use or closed-source linking are not detailed in the README.

Limitations & Caveats

The 'Large' model is exclusively available via API and not supported by this repository. Stable Audio 3 Medium requires Flash Attention 2, and installation issues can lead to static glitch sounds. Troubleshooting Flash Attention installation is critical for the medium model's functionality.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
22
Issues (30d)
9
Star History
339 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.0%
3k
Audio generation research paper using latent diffusion
Created 3 years ago
Updated 11 months ago
Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.1%
10k
Toolkit for audio, music, and speech generation research
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.