audio-ai-hub by BinWang28

Audio LLM resource list (models, datasets, benchmarks, surveys)

Created 2 years ago

940 stars

Top 38.3% on SourcePulse

Project Summary

This repository serves as a curated list of research papers, models, datasets, and benchmarks related to Large Language Models (LLMs) specifically designed for audio processing. It aims to be a comprehensive resource for researchers and practitioners in the field of Audio LLMs, facilitating discovery and contribution to this rapidly evolving area.

How It Works

The repository categorizes advancements in Audio LLMs into several key areas: Models and Methods, Benchmarks, Surveys, Multimodal Studies, Safety research, and Chatbots. It provides links to papers, Hugging Face models, and demos, offering a structured overview of the state-of-the-art and emerging trends. The inclusion of diverse models like OSUM, Step-Audio, and Typhoon2-Audio highlights various approaches to integrating audio capabilities into LLMs.

Quick Start & Requirements

This is a curated list of research and not a runnable software package. To engage with the listed models, users will need to refer to the individual project pages linked within the repository for specific installation and usage instructions.

Highlighted Details

Extensive coverage of models from late 2023 to early 2025, indicating a very active research landscape.
Inclusion of specialized benchmarks like VoiceBench, MMAU, and AudioBench for evaluating Audio LLM performance.
Categorization of research into specific areas such as audio hallucination and voice jailbreak attacks.
Features a "Contributors" section encouraging community involvement and contributions.

Maintenance & Community

The project actively encourages community contributions through issues and pull requests. It lists several key contributors and institutions involved in the research, including Meta, Alibaba Group, Tsinghua University, and NTU-Taiwan.

Licensing & Compatibility

The repository itself is a list of links and does not have a specific license. The licenses of the individual models and papers referenced would need to be checked on their respective project pages.

Limitations & Caveats

As a curated list, the repository's content is dependent on the availability and submission of new research. It does not provide direct access to run any of the models, requiring users to navigate to external resources for implementation.

audio-ai-hub by BinWang28

Explore Similar Projects

AudioBench by AudioLLMs

WavChat by jishengpeng

speech-recognition-uk by egorsmkv

Large-Audio-Models by liusongxiang

dasheng-lm by xiaomi-research

awesome-large-audio-models by EmulationAI

LLaSM by LinkSoul-AI

awesome-ai-voice by wildminder

dia2 by nari-labs

Kimi-Audio by MoonshotAI

StyleTTS2 by yl4579

ChatTTS by 2noise