stable-audio-tools  by Stability-AI

Audio generation models training/inference code

Created 2 years ago
3,439 stars

Top 14.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the training and inference code for Stability AI's generative audio models, targeting researchers and developers interested in creating and deploying custom audio generation systems. It enables conditional audio generation, allowing users to control the output based on various inputs.

How It Works

The library utilizes PyTorch Lightning for efficient multi-GPU and multi-node training, supporting various model types including autoencoders and different diffusion model architectures. Checkpoints are managed via a "training wrapper" that includes optimizer states and other training-specific components, which can be "unwrapped" for inference or fine-tuning. This separation allows for cleaner model management and flexibility in deployment.

Quick Start & Requirements

  • Install via pip: pip install stable-audio-tools
  • For training/inference code: pip install . after cloning the repository.
  • Requires PyTorch 2.0+ for Flash Attention.
  • Development is done in Python 3.8.10.
  • A Gradio interface is available for testing pre-trained models: python3 ./run_gradio.py --pretrained-name stabilityai/stable-audio-open-1.0
  • Training requires a Weights & Biases account (wandb login).
  • Official documentation for configurations: [Not explicitly linked, but implied by "Configurations section below"]

Highlighted Details

  • Supports multiple model types: autoencoder, diffusion (unconditional, conditional, inpainting), and language models.
  • Flexible checkpoint management with "training wrapper" and "unwrap_model.py" script.
  • Gradio interface for easy testing of pre-trained models.
  • Training configuration via JSON files for models and datasets.

Maintenance & Community

  • Developed by Stability AI.
  • No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

  • License type is not specified in the README.
  • Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The README mentions that the project is still under development with a "Todo" list including troubleshooting and contribution guidelines, suggesting potential incompleteness or ongoing changes. Specific model types are limited to those listed.

Health Check
Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
44 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.