awesome-large-audio-models  by EmulationAI

Curated list of Large Language Models in Audio AI

Created 2 years ago
692 stars

Top 49.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a curated collection of resources, papers, and models related to the burgeoning field of Large Audio Models (LAMs). It aims to provide researchers and practitioners with a comprehensive overview of state-of-the-art advancements, challenges, and applications of LLMs in audio signal processing, from speech tasks to music generation.

How It Works

The project is structured around a survey paper, "Sparks of Large Audio Models: A Survey and Outlook," which categorizes and analyzes recent developments. It highlights transformer-based architectures and foundational models like SeamlessM4T, emphasizing their efficacy in handling diverse audio sources and tasks without task-specific systems. The collection includes links to papers, open-source implementations, and datasets, facilitating exploration and adoption of LAMs.

Quick Start & Requirements

This repository is a curated list of resources, not a runnable software package. To utilize the models and datasets mentioned, users will need to refer to the individual project links provided within the repository for installation and usage instructions. Prerequisites will vary significantly depending on the specific model or dataset.

Highlighted Details

  • Comprehensive coverage of Large Audio Models across various domains: Automatic Speech Recognition (ASR), Neural Speech Synthesis, Speech Translation (ST), Music Generation, and other speech applications.
  • Extensive list of relevant survey papers and foundational research in LLMs and audio processing.
  • Detailed categorization of popular Large Audio Models with links to their respective papers and, where available, GitHub repositories.
  • Curated list of significant audio datasets, including their size and download links.

Maintenance & Community

The repository is associated with the survey paper "Sparks of Large Audio Models: A Survey and Outlook" by Siddique Latif et al. The project aims for regular updates with the latest papers and open-source implementations.

Licensing & Compatibility

The repository itself is a collection of links and information; licensing and compatibility depend entirely on the individual projects and datasets referenced within. Users must consult the licenses of each linked resource.

Limitations & Caveats

This repository is a curated list and does not provide direct access to or runnable code for the listed models. Users must independently locate, install, and configure each model and dataset, which may involve significant technical effort and resource requirements.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.