stable-audio-tools  by Stability-AI

Audio generation models training/inference code

created 2 years ago
3,375 stars

Top 14.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the training and inference code for Stability AI's generative audio models, targeting researchers and developers interested in creating and deploying custom audio generation systems. It enables conditional audio generation, allowing users to control the output based on various inputs.

How It Works

The library utilizes PyTorch Lightning for efficient multi-GPU and multi-node training, supporting various model types including autoencoders and different diffusion model architectures. Checkpoints are managed via a "training wrapper" that includes optimizer states and other training-specific components, which can be "unwrapped" for inference or fine-tuning. This separation allows for cleaner model management and flexibility in deployment.

Quick Start & Requirements

  • Install via pip: pip install stable-audio-tools
  • For training/inference code: pip install . after cloning the repository.
  • Requires PyTorch 2.0+ for Flash Attention.
  • Development is done in Python 3.8.10.
  • A Gradio interface is available for testing pre-trained models: python3 ./run_gradio.py --pretrained-name stabilityai/stable-audio-open-1.0
  • Training requires a Weights & Biases account (wandb login).
  • Official documentation for configurations: [Not explicitly linked, but implied by "Configurations section below"]

Highlighted Details

  • Supports multiple model types: autoencoder, diffusion (unconditional, conditional, inpainting), and language models.
  • Flexible checkpoint management with "training wrapper" and "unwrap_model.py" script.
  • Gradio interface for easy testing of pre-trained models.
  • Training configuration via JSON files for models and datasets.

Maintenance & Community

  • Developed by Stability AI.
  • No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

  • License type is not specified in the README.
  • Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The README mentions that the project is still under development with a "Todo" list including troubleshooting and contribution guidelines, suggesting potential incompleteness or ongoing changes. Specific model types are limited to those listed.

Health Check
Last commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
4
Star History
342 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

maestro by roboflow

0.1%
3k
CLI/SDK for fine-tuning multimodal models
created 1 year ago
updated 5 days ago
Feedback? Help us improve.