Multimodal-Sentiment-Analysis  by YeexiaoZheng

Multimodal sentiment analysis using BERT+ResNet50

Created 3 years ago
317 stars

Top 85.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides PyTorch implementations for multimodal sentiment analysis, combining text (BERT) and image (ResNet50) features. It targets researchers and students in AI and NLP who need to experiment with various fusion strategies for sentiment classification tasks. The project offers five distinct fusion methods, including naive combinations and attention-based mechanisms, allowing for comparative analysis of their effectiveness.

How It Works

The project leverages Hugging Face's transformers library for BERT text encoding and torchvision for ResNet image feature extraction. It implements five fusion strategies: two naive approaches (concatenation and category-wise combination) and three attention-based methods (CrossModalityAttentionCombine, HiddenStateTransformerEncoderCombine, and OutputTransformerEncoder). These methods aim to effectively integrate information from both modalities, with attention mechanisms designed to capture inter-modal dependencies.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Download dataset from the provided Baidu Netdisk link and extract it into the data folder.
  • Training: python main.py --do_train --epoch 10 --text_pretrained_model roberta-base --fuse_model_type OTE
  • Testing: python main.py --do_test --text_pretrained_model roberta-base --fuse_model_type OTE --load_model_path $your_model_path$
  • Requirements: PyTorch 1.8.2, Torchvision 0.9.2, Transformers 4.18.0.

Highlighted Details

  • Implements five fusion methods: NaiveCat, NaiveCombine, CrossModalityAttentionCombine, HiddenStateTransformerEncoder, and OutputTransformerEncoder.
  • Achieves up to 74.625% accuracy with the OutputTransformerEncoder model.
  • Includes ablation studies showing Text Only (71.875%) and Image Only (63%) performance.
  • References related research papers and projects in multimodal sentiment analysis.

Maintenance & Community

The project appears to be a course assignment from "数据学院人工智能课程" (Data Science AI Course). No specific community channels or active maintenance indicators are present in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of code from other GitHub repositories (e.g., guitld/Transfer-Learning-with-Joint-Fine-Tuning-for-Multimodal-Sentiment-Analysis) suggests potential licensing considerations inherited from those sources. Commercial use is not explicitly addressed.

Limitations & Caveats

The project relies on a specific version of PyTorch (1.8.2) and Torchvision (0.9.2), which might be outdated. The dataset is hosted on Baidu Netdisk, which may have regional access limitations. The project's primary purpose seems to be educational, and its robustness for production environments is not detailed.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Douwe Kiela Douwe Kiela(Cofounder of Contextual AI), and
1 more.

lens by ContextualAI

0.3%
353
Vision-language research paper using LLMs
Created 2 years ago
Updated 1 month ago
Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 4 months ago
Feedback? Help us improve.