Curated list of audio-visual papers and datasets
Top 47.7% on sourcepulse
This repository is a curated list of papers and datasets focused on audio-visual processing, inspired by the "awesome-computer-vision" format. It serves researchers and practitioners in machine learning and computer vision who are working on tasks that leverage both audio and visual information from videos. The primary benefit is a centralized, organized resource for exploring the state-of-the-art in this interdisciplinary field.
How It Works
The repository categorizes research papers and datasets across a wide spectrum of audio-visual tasks. These include localization, separation, representation learning, action recognition, deepfakes, navigation, speech processing, question answering, stylization, and generation. Each entry typically links to the paper, and often to associated code, project pages, or datasets, providing a comprehensive overview of the research landscape.
Quick Start & Requirements
This is a curated list, not a software package. No installation or execution is required. Users access the information via the README.
Highlighted Details
Maintenance & Community
The repository is maintained by Kranti Kumar Parida, with an open invitation for pull requests and contributions to add or correct links.
Licensing & Compatibility
The content is licensed under a Creative Commons CC0 (Public Domain Dedication), meaning it is free for all uses without restriction.
Limitations & Caveats
As a curated list, the repository's content is dependent on the maintainer's and community's ongoing efforts to update it with the latest research. Links may become outdated over time.
1 year ago
Inactive