awesome-large-audio-models by EmulationAI

Curated list of Large Language Models in Audio AI

Created 2 years ago

733 stars

Top 46.3% on SourcePulse

Project Summary

This repository serves as a curated collection of resources, papers, and models related to the burgeoning field of Large Audio Models (LAMs). It aims to provide researchers and practitioners with a comprehensive overview of state-of-the-art advancements, challenges, and applications of LLMs in audio signal processing, from speech tasks to music generation.

How It Works

The project is structured around a survey paper, "Sparks of Large Audio Models: A Survey and Outlook," which categorizes and analyzes recent developments. It highlights transformer-based architectures and foundational models like SeamlessM4T, emphasizing their efficacy in handling diverse audio sources and tasks without task-specific systems. The collection includes links to papers, open-source implementations, and datasets, facilitating exploration and adoption of LAMs.

Quick Start & Requirements

This repository is a curated list of resources, not a runnable software package. To utilize the models and datasets mentioned, users will need to refer to the individual project links provided within the repository for installation and usage instructions. Prerequisites will vary significantly depending on the specific model or dataset.

Highlighted Details

Comprehensive coverage of Large Audio Models across various domains: Automatic Speech Recognition (ASR), Neural Speech Synthesis, Speech Translation (ST), Music Generation, and other speech applications.
Extensive list of relevant survey papers and foundational research in LLMs and audio processing.
Detailed categorization of popular Large Audio Models with links to their respective papers and, where available, GitHub repositories.
Curated list of significant audio datasets, including their size and download links.

Maintenance & Community

The repository is associated with the survey paper "Sparks of Large Audio Models: A Survey and Outlook" by Siddique Latif et al. The project aims for regular updates with the latest papers and open-source implementations.

Licensing & Compatibility

The repository itself is a collection of links and information; licensing and compatibility depend entirely on the individual projects and datasets referenced within. Users must consult the licenses of each linked resource.

Limitations & Caveats

This repository is a curated list and does not provide direct access to or runnable code for the listed models. Users must independently locate, install, and configure each model and dataset, which may involve significant technical effort and resource requirements.

awesome-large-audio-models by EmulationAI

Explore Similar Projects

awesome-audio-plaza by metame-ai

unified-audio by alibaba

Large-Audio-Models by liusongxiang

UniAudio by yangdongchao

dasheng-lm by xiaomi-research

awesome-ai-voice by wildminder

audio-ai-timeline by archinetai

free-voice-clone by 0xSojalSec

audiolm-pytorch by lucidrains

Kimi-Audio by MoonshotAI

Amphion by open-mmlab

audiocraft by facebookresearch