MELD by declare-lab

Dataset for emotion recognition research

Created 7 years ago

996 stars

Top 37.2% on SourcePulse

Project Summary

MELD is a comprehensive multimodal dataset for emotion recognition in multi-party conversations, derived from the Friends TV series. It provides text, audio, and visual modalities for over 13,000 utterances across more than 1400 dialogues, with each utterance labeled for seven emotions and three sentiment categories. This dataset is valuable for researchers and developers building advanced conversational AI systems, affective computing models, and dialogue generation tools that require nuanced understanding of emotional dynamics in group interactions.

How It Works

MELD extends the EmotionLines dataset by incorporating synchronized audio and visual data alongside text. Utterances are extracted with precise timestamps from TV show subtitles, ensuring alignment across modalities. The dataset's structure facilitates context-aware emotion recognition by capturing conversational flow and speaker interactions, aiming to improve performance over unimodal approaches.

Quick Start & Requirements

Data Download: wget http://web.eecs.umich.edu/~mihalcea/downloads/MELD.Raw.tar.gz
Prerequisites: Python, libraries for data processing (e.g., pickle). Specific feature extraction or model training may require deep learning frameworks (TensorFlow, PyTorch) and potentially GPU acceleration.
Resources: The raw data is substantial. Pre-extracted features are available for download.
Details: MELD Dataset, Paper

Highlighted Details

Contains over 1400 dialogues and 13,000 utterances from the Friends TV series.
Each utterance is annotated with seven emotions (Anger, Disgust, Sadness, Joy, Neutral, Surprise, Fear) and sentiment (positive, negative, neutral).
Includes both a multi-party version and a dyadic subset for specialized research.
Provides pre-extracted features for text (Glove, CNN, bcLSTM) and audio (openSMILE, SVM-based selection, bcLSTM).

Maintenance & Community

The project is associated with the declare-lab at the University of Michigan. Recent updates include new papers and SOTA baselines (COSMIC).

Licensing & Compatibility

The dataset itself is intended for research purposes. Specific licensing details for commercial use are not explicitly stated in the README, but the data is derived from a TV series, implying potential copyright considerations.

Limitations & Caveats

The dataset is based on a specific TV show, which may limit generalizability to other conversational contexts. Some utterances might have missing start/end times due to subtitle inconsistencies, requiring manual correction or omission.

MELD by declare-lab

Explore Similar Projects

WavChat by jishengpeng

Chinese-LLaVA by LinkSoul-AI

DialogStudio by salesforce

dl-for-emo-tts by Emotional-Text-to-Speech

EmpatheticDialogues by facebookresearch

SoulChat by scutcyr

awesome_talking_face_generation by YunjinPark

multiwoz by budzianowski

voice_datasets by jim-schwoebel

CDial-GPT by thu-coai

conv-emotion by declare-lab

ChatTTS by 2noise