Multimodal-Sentiment-Analysis by YeexiaoZheng

Multimodal sentiment analysis using BERT+ResNet50

Created 3 years ago

348 stars

Top 79.9% on SourcePulse

Project Summary

This repository provides PyTorch implementations for multimodal sentiment analysis, combining text (BERT) and image (ResNet50) features. It targets researchers and students in AI and NLP who need to experiment with various fusion strategies for sentiment classification tasks. The project offers five distinct fusion methods, including naive combinations and attention-based mechanisms, allowing for comparative analysis of their effectiveness.

How It Works

The project leverages Hugging Face's transformers library for BERT text encoding and torchvision for ResNet image feature extraction. It implements five fusion strategies: two naive approaches (concatenation and category-wise combination) and three attention-based methods (CrossModalityAttentionCombine, HiddenStateTransformerEncoderCombine, and OutputTransformerEncoder). These methods aim to effectively integrate information from both modalities, with attention mechanisms designed to capture inter-modal dependencies.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Download dataset from the provided Baidu Netdisk link and extract it into the data folder.
Training: python main.py --do_train --epoch 10 --text_pretrained_model roberta-base --fuse_model_type OTE
Testing: python main.py --do_test --text_pretrained_model roberta-base --fuse_model_type OTE --load_model_path $your_model_path$
Requirements: PyTorch 1.8.2, Torchvision 0.9.2, Transformers 4.18.0.

Highlighted Details

Implements five fusion methods: NaiveCat, NaiveCombine, CrossModalityAttentionCombine, HiddenStateTransformerEncoder, and OutputTransformerEncoder.
Achieves up to 74.625% accuracy with the OutputTransformerEncoder model.
Includes ablation studies showing Text Only (71.875%) and Image Only (63%) performance.
References related research papers and projects in multimodal sentiment analysis.

Maintenance & Community

The project appears to be a course assignment from "数据学院人工智能课程" (Data Science AI Course). No specific community channels or active maintenance indicators are present in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of code from other GitHub repositories (e.g., guitld/Transfer-Learning-with-Joint-Fine-Tuning-for-Multimodal-Sentiment-Analysis) suggests potential licensing considerations inherited from those sources. Commercial use is not explicitly addressed.

Limitations & Caveats

The project relies on a specific version of PyTorch (1.8.2) and Torchvision (0.9.2), which might be outdated. The dataset is hosted on Baidu Netdisk, which may have regional access limitations. The project's primary purpose seems to be educational, and its robustness for production environments is not detailed.

Multimodal-Sentiment-Analysis by YeexiaoZheng

Explore Similar Projects

lens by ContextualAI

CM3Leon by kyegomez

ru-dolph by ai-forever

METER by zdou0830

ScreenAI by kyegomez

llava-phi by xmoanvaf

exbert by bhoov

Gemini by kyegomez

Show-o by showlab

open_flamingo by mlfoundations

NExT-GPT by NExT-GPT

DeepSeek-VL by deepseek-ai