Large-Audio-Models  by liusongxiang

Curated list of large audio models

created 2 years ago
491 stars

Top 63.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a curated list of significant large models and research papers in the audio domain, covering speech, music, and sound generation and understanding. It targets researchers and engineers working with state-of-the-art audio AI, providing a centralized reference for foundational models and recent advancements.

How It Works

The project aggregates links to papers and their corresponding code repositories, categorized by application area such as spoken language models, prompt-based audio synthesis, audio language models, and self-supervised learning (SSL/UL) models. This approach offers a structured overview of the rapidly evolving landscape of large audio models.

Quick Start & Requirements

This repository is a curated list and does not have a direct installation or execution command. Users are directed to individual project repositories for specific setup and requirements.

Highlighted Details

  • Comprehensive coverage of models from 2019 to late 2024.
  • Includes foundational SSL/UL models like wav2vec 2.0 and HuBERT.
  • Features prominent text-to-audio and text-to-music generation models (e.g., MusicLM, AudioLDM, NaturalSpeech 2).
  • Lists recent advancements in speech-language integration (e.g., LLaMA-Omni, SpeechGPT).

Maintenance & Community

The list appears to be actively updated with recent publications, indicating ongoing curation. Specific community channels or contributor details are not provided in the README.

Licensing & Compatibility

The repository itself is a list of links and does not have a specific license. The licenses of the linked projects vary and must be checked individually.

Limitations & Caveats

This is a reference list and does not provide any code for direct use or experimentation. Users must navigate to individual project repositories to access code, models, and specific usage instructions.

Health Check
Last commit

10 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers).

audio-ai-timeline by archinetai

0%
2k
AI model timeline for audio generation
created 2 years ago
updated 1 year ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

awesome-diarization by wq2012

0.3%
2k
List of resources for speaker diarization
created 6 years ago
updated 1 week ago
Feedback? Help us improve.