stable-audio-tools by Stability-AI

Audio generation models training/inference code

Created 2 years ago

3,562 stars

Top 13.6% on SourcePulse

View on GitHub

6 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Cofounder of Prime Intellect

and 2 more!

Project Summary

This repository provides the training and inference code for Stability AI's generative audio models, targeting researchers and developers interested in creating and deploying custom audio generation systems. It enables conditional audio generation, allowing users to control the output based on various inputs.

How It Works

The library utilizes PyTorch Lightning for efficient multi-GPU and multi-node training, supporting various model types including autoencoders and different diffusion model architectures. Checkpoints are managed via a "training wrapper" that includes optimizer states and other training-specific components, which can be "unwrapped" for inference or fine-tuning. This separation allows for cleaner model management and flexibility in deployment.

Quick Start & Requirements

Install via pip: pip install stable-audio-tools
For training/inference code: pip install . after cloning the repository.
Requires PyTorch 2.0+ for Flash Attention.
Development is done in Python 3.8.10.
A Gradio interface is available for testing pre-trained models: python3 ./run_gradio.py --pretrained-name stabilityai/stable-audio-open-1.0
Training requires a Weights & Biases account (wandb login).
Official documentation for configurations: [Not explicitly linked, but implied by "Configurations section below"]

Highlighted Details

Supports multiple model types: autoencoder, diffusion (unconditional, conditional, inpainting), and language models.
Flexible checkpoint management with "training wrapper" and "unwrap_model.py" script.
Gradio interface for easy testing of pre-trained models.
Training configuration via JSON files for models and datasets.

Maintenance & Community

Developed by Stability AI.
No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

License type is not specified in the README.
Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The README mentions that the project is still under development with a "Todo" list including troubleshooting and contribution guidelines, suggesting potential incompleteness or ongoing changes. Specific model types are limited to those listed.

Health Check

Last Commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

35 stars in the last 30 days