Audio generation models training/inference code
Top 14.7% on sourcepulse
This repository provides the training and inference code for Stability AI's generative audio models, targeting researchers and developers interested in creating and deploying custom audio generation systems. It enables conditional audio generation, allowing users to control the output based on various inputs.
How It Works
The library utilizes PyTorch Lightning for efficient multi-GPU and multi-node training, supporting various model types including autoencoders and different diffusion model architectures. Checkpoints are managed via a "training wrapper" that includes optimizer states and other training-specific components, which can be "unwrapped" for inference or fine-tuning. This separation allows for cleaner model management and flexibility in deployment.
Quick Start & Requirements
pip install stable-audio-tools
pip install .
after cloning the repository.python3 ./run_gradio.py --pretrained-name stabilityai/stable-audio-open-1.0
wandb login
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README mentions that the project is still under development with a "Todo" list including troubleshooting and contribution guidelines, suggesting potential incompleteness or ongoing changes. Specific model types are limited to those listed.
2 weeks ago
1 week