Survey paper on spoken dialogue models
Top 88.7% on sourcepulse
WavChat is a comprehensive survey of spoken dialogue models, targeting researchers and developers in the AI and speech technology fields. It systematically organizes and analyzes the rapidly evolving landscape of spoken dialogue systems, offering insights into their architecture, capabilities, and training methodologies, thereby facilitating advancements in human-computer interaction.
How It Works
The survey categorizes spoken dialogue systems into cascaded and end-to-end paradigms, detailing core technologies such as speech representation (semantic vs. acoustic), training paradigms, and streaming/duplex capabilities. It provides a chronological timeline of models and discusses limitations and future research directions for each aspect.
Quick Start & Requirements
This repository serves as a companion to the WavChat survey paper. The primary "quick start" is to access the survey document itself, which is available on arXiv. The survey details numerous models and datasets, requiring users to visit external GitHub repositories and research papers for specific implementation details and requirements.
Highlighted Details
Maintenance & Community
The project is associated with the research paper "WavChat: A Survey of Spoken Dialogue Models" published on arXiv. The primary contributor is Shengpeng Ji. Further community engagement or project updates are not explicitly detailed in the README.
Licensing & Compatibility
The repository itself does not specify a license. The associated survey paper is a pre-print on arXiv. Individual models and datasets mentioned within the survey will have their own respective licenses, which may vary and could include restrictions on commercial use.
Limitations & Caveats
This repository is a survey document and does not contain executable code for the models discussed. Users must refer to the individual project repositories linked within the survey for implementation, setup, and usage. The field is rapidly evolving, meaning the survey represents a snapshot in time.
8 months ago
1 day