Large-Audio-Models by liusongxiang

Curated list of large audio models

Created 3 years ago

514 stars

Top 60.2% on SourcePulse

Project Summary

This repository serves as a curated list of significant large models and research papers in the audio domain, covering speech, music, and sound generation and understanding. It targets researchers and engineers working with state-of-the-art audio AI, providing a centralized reference for foundational models and recent advancements.

How It Works

The project aggregates links to papers and their corresponding code repositories, categorized by application area such as spoken language models, prompt-based audio synthesis, audio language models, and self-supervised learning (SSL/UL) models. This approach offers a structured overview of the rapidly evolving landscape of large audio models.

Quick Start & Requirements

This repository is a curated list and does not have a direct installation or execution command. Users are directed to individual project repositories for specific setup and requirements.

Highlighted Details

Comprehensive coverage of models from 2019 to late 2024.
Includes foundational SSL/UL models like wav2vec 2.0 and HuBERT.
Features prominent text-to-audio and text-to-music generation models (e.g., MusicLM, AudioLDM, NaturalSpeech 2).
Lists recent advancements in speech-language integration (e.g., LLaMA-Omni, SpeechGPT).

Maintenance & Community

The list appears to be actively updated with recent publications, indicating ongoing curation. Specific community channels or contributor details are not provided in the README.

Licensing & Compatibility

The repository itself is a list of links and does not have a specific license. The licenses of the linked projects vary and must be checked individually.

Limitations & Caveats

This is a reference list and does not provide any code for direct use or experimentation. Users must navigate to individual project repositories to access code, models, and specific usage instructions.

Large-Audio-Models by liusongxiang

Explore Similar Projects

xcodec by zhenye234

Ming-UniAudio by inclusionAI

unified-audio by alibaba

DiffGAN-TTS by keonlee9420

audio-ai-hub by BinWang28

awesome-large-audio-models by EmulationAI

awesome-ai-voice by wildminder

free-voice-clone by 0xSojalSec

Kimi-Audio by MoonshotAI

Step-Audio by stepfun-ai

StyleTTS2 by yl4579

csm by SesameAILabs