awesome-large-audio-models  by EmulationAI

Curated list of Large Language Models in Audio AI

created 1 year ago
685 stars

Top 50.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a curated collection of resources, papers, and models related to the burgeoning field of Large Audio Models (LAMs). It aims to provide researchers and practitioners with a comprehensive overview of state-of-the-art advancements, challenges, and applications of LLMs in audio signal processing, from speech tasks to music generation.

How It Works

The project is structured around a survey paper, "Sparks of Large Audio Models: A Survey and Outlook," which categorizes and analyzes recent developments. It highlights transformer-based architectures and foundational models like SeamlessM4T, emphasizing their efficacy in handling diverse audio sources and tasks without task-specific systems. The collection includes links to papers, open-source implementations, and datasets, facilitating exploration and adoption of LAMs.

Quick Start & Requirements

This repository is a curated list of resources, not a runnable software package. To utilize the models and datasets mentioned, users will need to refer to the individual project links provided within the repository for installation and usage instructions. Prerequisites will vary significantly depending on the specific model or dataset.

Highlighted Details

  • Comprehensive coverage of Large Audio Models across various domains: Automatic Speech Recognition (ASR), Neural Speech Synthesis, Speech Translation (ST), Music Generation, and other speech applications.
  • Extensive list of relevant survey papers and foundational research in LLMs and audio processing.
  • Detailed categorization of popular Large Audio Models with links to their respective papers and, where available, GitHub repositories.
  • Curated list of significant audio datasets, including their size and download links.

Maintenance & Community

The repository is associated with the survey paper "Sparks of Large Audio Models: A Survey and Outlook" by Siddique Latif et al. The project aims for regular updates with the latest papers and open-source implementations.

Licensing & Compatibility

The repository itself is a collection of links and information; licensing and compatibility depend entirely on the individual projects and datasets referenced within. Users must consult the licenses of each linked resource.

Limitations & Caveats

This repository is a curated list and does not provide direct access to or runnable code for the listed models. Users must independently locate, install, and configure each model and dataset, which may involve significant technical effort and resource requirements.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers).

audio-ai-timeline by archinetai

0%
2k
AI model timeline for audio generation
created 2 years ago
updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers).

voice_datasets by jim-schwoebel

0.1%
2k
Voice dataset list for voice/sound computing
created 6 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.