Audio LLM resource list (models, datasets, benchmarks, surveys)
Top 52.5% on sourcepulse
This repository serves as a curated list of research papers, models, datasets, and benchmarks related to Large Language Models (LLMs) specifically designed for audio processing. It aims to be a comprehensive resource for researchers and practitioners in the field of Audio LLMs, facilitating discovery and contribution to this rapidly evolving area.
How It Works
The repository categorizes advancements in Audio LLMs into several key areas: Models and Methods, Benchmarks, Surveys, Multimodal Studies, Safety research, and Chatbots. It provides links to papers, Hugging Face models, and demos, offering a structured overview of the state-of-the-art and emerging trends. The inclusion of diverse models like OSUM, Step-Audio, and Typhoon2-Audio highlights various approaches to integrating audio capabilities into LLMs.
Quick Start & Requirements
This is a curated list of research and not a runnable software package. To engage with the listed models, users will need to refer to the individual project pages linked within the repository for specific installation and usage instructions.
Highlighted Details
Maintenance & Community
The project actively encourages community contributions through issues and pull requests. It lists several key contributors and institutions involved in the research, including Meta, Alibaba Group, Tsinghua University, and NTU-Taiwan.
Licensing & Compatibility
The repository itself is a list of links and does not have a specific license. The licenses of the individual models and papers referenced would need to be checked on their respective project pages.
Limitations & Caveats
As a curated list, the repository's content is dependent on the availability and submission of new research. It does not provide direct access to run any of the models, requiring users to navigate to external resources for implementation.
4 weeks ago
1+ week