Amphion by open-mmlab

Toolkit for audio, music, and speech generation research

Created 2 years ago

9,696 stars

Top 5.3% on SourcePulse

3 Experts Love This Project

claforte

Christian Laforte

Distinguished Engineer at NVIDIA; Former CTO at Stability AI

chiphuyen

Author of "AI Engineering", "Designing Machine Learning Systems"

aangelopoulos

Anastasios Angelopoulos

Cofounder of LMArena

Project Summary

Amphion is an open-source toolkit designed to facilitate reproducible research and development in audio, music, and speech generation. It targets junior researchers and engineers, offering a comprehensive platform with support for various generation tasks, vocoders, and evaluation metrics, aiming to simplify the process of converting any input into audio.

How It Works

Amphion provides implementations of numerous state-of-the-art models across diverse audio generation tasks, including TTS, VC, AC, SVC, and TTA. It leverages architectures such as diffusion, transformer, VAE, and flow-based models. A key differentiator is its inclusion of visualization tools for classic models, aiding understanding of internal mechanisms, and its support for large-scale dataset preprocessing, exemplified by the Emilia dataset and Emilia-Pipe.

Quick Start & Requirements

Installation: Via git clone and sh env.sh for Python dependencies, or via Docker (docker pull realamphion/amphion and docker run).
Prerequisites: Python 3.9.15, Docker, NVIDIA Driver, NVIDIA Container Toolkit, CUDA (for Docker).
Resources: Requires significant disk space for datasets and GPU resources for training/inference.
Docs: https://github.com/open-mmlab/Amphion

Highlighted Details

Supports 10+ TTS architectures (FastSpeech2, VITS, VALL-E, NaturalSpeech2, MaskGCT, Vevo-TTS) and 5+ VC models (Vevo, FACodec, Noro).
Includes a wide array of vocoders (GAN-based, Flow-based, Diffusion-based, Auto-regressive) and comprehensive objective evaluation metrics.
Features the Emilia dataset (101k+ hours) and Emilia-Pipe for in-the-wild speech data preprocessing.
Offers visualization tools like SingVisio for understanding model mechanisms.

Maintenance & Community

Active development with recent releases (Vevo1.5, Metis, MaskGCT, Vevo reproduction).
Community engagement via Discord channel.
Contributions are welcomed via CONTRIBUTING.md.

Licensing & Compatibility

MIT License, permitting free research and commercial use.

Limitations & Caveats

Some tasks like Singing Voice Synthesis (SVS) and Text to Music (TTM) are marked as "developing."
Docker setup requires specific NVIDIA software stack.

Health Check

Last Commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

2

Star History

44 stars in the last 30 days

Explore Similar Projects

awesome-audio-plaza by metame-ai

Curated list of audio research papers, projects, and resources

Created 2 years ago

Updated 3 months ago

FunCodec by modelscope

Speech codec toolkit for audio quantization and downstream tasks

Created 2 years ago

Updated 2 years ago

Starred by

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs).

assem-vc by maum-ai

PyTorch code for any-to-many voice conversion research

Created 4 years ago

Updated 3 years ago

UniAudio by yangdongchao

Audio foundation model for universal audio generation

Created 2 years ago

Updated 1 year ago

awesome-large-audio-models by EmulationAI

Curated list of Large Language Models in Audio AI

Created 2 years ago

Updated 4 months ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

FastDiff by Rongjiehuang

PyTorch implementation for fast, high-fidelity speech synthesis via conditional diffusion

Created 4 years ago

Updated 1 year ago

ultimate-rvc by JackismyShephard

AI-powered audio generation and voice manipulation

Created 1 year ago

Updated 2 months ago

lora-svc by PlayVoice

Singing voice conversion tool using Whisper & BigVGAN

Created 3 years ago

Updated 2 years ago

Easy-Voice-Toolkit by Spr-Aachen

Local AI voice toolkit for audio processing, recognition, transcription, and conversion

Created 3 years ago

Updated 1 week ago

PDF2Audio by lamm-mit

PDF-to-audio conversion tool

Created 1 year ago

Updated 10 months ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

audiolm-pytorch by lucidrains

PyTorch implementation of Google's AudioLM for audio generation

Created 3 years ago

Updated 1 year ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Didier Lopes

Didier Lopes(Founder of OpenBB).

Zonos by Zyphra

Open-weight text-to-speech model for expressive, high-quality speech generation

Created 1 year ago

Updated 11 months ago

Feedback? Help us improve.