Dataset for task-oriented dialogue modeling
Top 40.9% on sourcepulse
This repository provides the MultiWOZ dataset, a large-scale, human-human conversation corpus for task-oriented dialogue systems, along with baseline implementations and evaluation scripts. It is designed for researchers and developers working on dialogue state tracking and response generation.
How It Works
The project centers around the MultiWOZ dataset, which comprises over 10,000 dialogues across multiple domains, annotated with goals, utterances, and belief states. It supports end-to-end dialogue modeling and dialogue state tracking, offering various dataset versions (2.0, 2.1, 2.2) with corrections and improvements. The code includes preprocessing scripts and baseline models for training and evaluation.
Quick Start & Requirements
python create_delex_data.py
.python train.py [--args=value]
.python test.py [--args=value]
.Highlighted Details
Maintenance & Community
The project was initiated by Paweł Budzianowski from Cambridge Dialogue Systems Group. Bug reports can be sent to budzianowski@gmail.com or jianguozhang@salesforce.com.
Licensing & Compatibility
Released under the MIT License, allowing for open-source use and modification.
Limitations & Caveats
The baseline code is specified for Python 2 and an older version of PyTorch (0.4.1), which may require significant adaptation for modern Python 3 environments. Some older benchmark results might not be directly comparable due to evaluation script inconsistencies.
6 months ago
Inactive