Discover and explore top open-source AI tools and projects—updated daily.
Multimodal sentiment analysis using BERT+ResNet50
Top 85.2% on SourcePulse
This repository provides PyTorch implementations for multimodal sentiment analysis, combining text (BERT) and image (ResNet50) features. It targets researchers and students in AI and NLP who need to experiment with various fusion strategies for sentiment classification tasks. The project offers five distinct fusion methods, including naive combinations and attention-based mechanisms, allowing for comparative analysis of their effectiveness.
How It Works
The project leverages Hugging Face's transformers
library for BERT text encoding and torchvision
for ResNet image feature extraction. It implements five fusion strategies: two naive approaches (concatenation and category-wise combination) and three attention-based methods (CrossModalityAttentionCombine, HiddenStateTransformerEncoderCombine, and OutputTransformerEncoder). These methods aim to effectively integrate information from both modalities, with attention mechanisms designed to capture inter-modal dependencies.
Quick Start & Requirements
pip install -r requirements.txt
data
folder.python main.py --do_train --epoch 10 --text_pretrained_model roberta-base --fuse_model_type OTE
python main.py --do_test --text_pretrained_model roberta-base --fuse_model_type OTE --load_model_path $your_model_path$
Highlighted Details
Maintenance & Community
The project appears to be a course assignment from "数据学院人工智能课程" (Data Science AI Course). No specific community channels or active maintenance indicators are present in the README.
Licensing & Compatibility
The repository does not explicitly state a license. The presence of code from other GitHub repositories (e.g., guitld/Transfer-Learning-with-Joint-Fine-Tuning-for-Multimodal-Sentiment-Analysis) suggests potential licensing considerations inherited from those sources. Commercial use is not explicitly addressed.
Limitations & Caveats
The project relies on a specific version of PyTorch (1.8.2) and Torchvision (0.9.2), which might be outdated. The dataset is hosted on Baidu Netdisk, which may have regional access limitations. The project's primary purpose seems to be educational, and its robustness for production environments is not detailed.
2 years ago
Inactive