Awesome-Audio-LLM  by AudioLLMs

Audio LLM resource list (models, datasets, benchmarks, surveys)

Created 1 year ago
721 stars

Top 47.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a curated list of research papers, models, datasets, and benchmarks related to Large Language Models (LLMs) specifically designed for audio processing. It aims to be a comprehensive resource for researchers and practitioners in the field of Audio LLMs, facilitating discovery and contribution to this rapidly evolving area.

How It Works

The repository categorizes advancements in Audio LLMs into several key areas: Models and Methods, Benchmarks, Surveys, Multimodal Studies, Safety research, and Chatbots. It provides links to papers, Hugging Face models, and demos, offering a structured overview of the state-of-the-art and emerging trends. The inclusion of diverse models like OSUM, Step-Audio, and Typhoon2-Audio highlights various approaches to integrating audio capabilities into LLMs.

Quick Start & Requirements

This is a curated list of research and not a runnable software package. To engage with the listed models, users will need to refer to the individual project pages linked within the repository for specific installation and usage instructions.

Highlighted Details

  • Extensive coverage of models from late 2023 to early 2025, indicating a very active research landscape.
  • Inclusion of specialized benchmarks like VoiceBench, MMAU, and AudioBench for evaluating Audio LLM performance.
  • Categorization of research into specific areas such as audio hallucination and voice jailbreak attacks.
  • Features a "Contributors" section encouraging community involvement and contributions.

Maintenance & Community

The project actively encourages community contributions through issues and pull requests. It lists several key contributors and institutions involved in the research, including Meta, Alibaba Group, Tsinghua University, and NTU-Taiwan.

Licensing & Compatibility

The repository itself is a list of links and does not have a specific license. The licenses of the individual models and papers referenced would need to be checked on their respective project pages.

Limitations & Caveats

As a curated list, the repository's content is dependent on the availability and submission of new research. It does not provide direct access to run any of the models, requiring users to navigate to external resources for implementation.

Health Check
Last Commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
53 stars in the last 30 days

Explore Similar Projects

Starred by Tim J. Baek Tim J. Baek(Founder of Open WebUI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

StyleTTS2 by yl4579

0.2%
6k
Text-to-speech model achieving human-level synthesis
Created 2 years ago
Updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
2 more.

ChatTTS by 2noise

0.2%
38k
Generative speech model for daily dialogue
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.