AffectGPT by zeroQiaoba

Multimodal emotion reasoning for open-vocabulary understanding

Created 3 years ago

406 stars

Top 71.2% on SourcePulse

Project Summary

AffectGPT addresses the limitations of traditional emotion recognition by introducing Open-Vocabulary Multimodal Emotion Recognition (OV-MER). It enables the prediction of any number and category of emotions from multimodal data, advancing emotion AI towards real-world applicability. The project provides the AffectGPT framework, datasets, and a benchmark suite for researchers and practitioners in multimodal LLMs and affective computing.

How It Works

The project defines the OV-MER task, allowing for a more comprehensive understanding of human emotions beyond fixed labels. AffectGPT is a framework designed for this task, leveraging multimodal large language models (MLLMs). It integrates pre-trained audio and video encoders with a large language model (Qwen2.5-7B-Instruct) to process and reason about emotional content across modalities. This approach offers a flexible and nuanced approach to emotion AI.

Quick Start & Requirements

Setup involves downloading datasets and models from provided links. Key resources include:

OV-MERD & MER-Caption+ datasets: https://huggingface.co/datasets/MERChallenge/MER2025
MER-UniBench datasets: https://pan.baidu.com/s/1kbfs5pG_hAri0QwvQl-Ecg?pwd=b9vn or https://1024terabox.com/s/1AE7uAU3Ib8aRBSyF1TMpow
AffectGPT models: https://pan.baidu.com/s/1IvC4H7Xt1AzMFocGMBBbHQ?pwd=hzf9 A Python environment with necessary libraries for MLLMs and data processing is required. Specific hardware requirements (e.g., GPUs) are not detailed but are expected for MLLM operations.

Highlighted Details

AffectGPT and the OV-MER task were presented with Oral distinction at ICML (Top 1%).
Introduces MER-UniBench, a benchmark for evaluating MLLM-based emotion understanding across multiple datasets.
Provides zero-shot baselines for MLLMs on the OV-MER task.
Includes extensive multimodal datasets like OV-MERD and MER-Caption+.

Maintenance & Community

The project acknowledges numerous related works, indicating engagement with the broader research community. No specific community channels (e.g., Discord, Slack) or formal roadmap are detailed in the provided text.

Licensing & Compatibility

The project is released under the Apache 2.0 license. However, it is strictly intended for non-commercial use ONLY. This restriction significantly impacts its compatibility for commercial applications or integration into proprietary systems.

Limitations & Caveats

The primary limitation is the strict non-commercial use restriction. Setup requires manual downloading of large datasets and models from external cloud storage links (Baidu/Terabox), which may present accessibility or reliability challenges. The project is research-focused, providing specific frameworks and datasets rather than a general-purpose library.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days