Amphion  by open-mmlab

Toolkit for audio, music, and speech generation research

created 1 year ago
9,267 stars

Top 5.5% on sourcepulse

GitHubView on GitHub
Project Summary

Amphion is an open-source toolkit designed to facilitate reproducible research and development in audio, music, and speech generation. It targets junior researchers and engineers, offering a comprehensive platform with support for various generation tasks, vocoders, and evaluation metrics, aiming to simplify the process of converting any input into audio.

How It Works

Amphion provides implementations of numerous state-of-the-art models across diverse audio generation tasks, including TTS, VC, AC, SVC, and TTA. It leverages architectures such as diffusion, transformer, VAE, and flow-based models. A key differentiator is its inclusion of visualization tools for classic models, aiding understanding of internal mechanisms, and its support for large-scale dataset preprocessing, exemplified by the Emilia dataset and Emilia-Pipe.

Quick Start & Requirements

  • Installation: Via git clone and sh env.sh for Python dependencies, or via Docker (docker pull realamphion/amphion and docker run).
  • Prerequisites: Python 3.9.15, Docker, NVIDIA Driver, NVIDIA Container Toolkit, CUDA (for Docker).
  • Resources: Requires significant disk space for datasets and GPU resources for training/inference.
  • Docs: https://github.com/open-mmlab/Amphion

Highlighted Details

  • Supports 10+ TTS architectures (FastSpeech2, VITS, VALL-E, NaturalSpeech2, MaskGCT, Vevo-TTS) and 5+ VC models (Vevo, FACodec, Noro).
  • Includes a wide array of vocoders (GAN-based, Flow-based, Diffusion-based, Auto-regressive) and comprehensive objective evaluation metrics.
  • Features the Emilia dataset (101k+ hours) and Emilia-Pipe for in-the-wild speech data preprocessing.
  • Offers visualization tools like SingVisio for understanding model mechanisms.

Maintenance & Community

  • Active development with recent releases (Vevo1.5, Metis, MaskGCT, Vevo reproduction).
  • Community engagement via Discord channel.
  • Contributions are welcomed via CONTRIBUTING.md.

Licensing & Compatibility

  • MIT License, permitting free research and commercial use.

Limitations & Caveats

  • Some tasks like Singing Voice Synthesis (SVS) and Text to Music (TTM) are marked as "developing."
  • Docker setup requires specific NVIDIA software stack.
Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
5
Star History
294 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
7 more.

TTS by coqui-ai

0.4%
42k
Deep learning toolkit for Text-to-Speech, research-tested
created 5 years ago
updated 11 months ago
Feedback? Help us improve.