Sentiment-Analysis-Twitter  by ayushoriginal

NLP research paper for Twitter sentiment analysis

created 8 years ago
776 stars

Top 45.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a research-oriented implementation for Twitter sentiment analysis, exploring various feature sets and machine learning classifiers to identify optimal combinations. It is targeted at NLP researchers and practitioners interested in microblogging sentiment analysis. The project offers a modular approach to experiment with different preprocessing, stemming, and classification techniques.

How It Works

The project employs a modular architecture with distinct feature extraction and classification components. It investigates preprocessing steps like handling hashtags, mentions, URLs, emoticons, punctuation, repeating characters, and applies stemming (Porter stemmer). Feature sets explored include unigrams, bigrams, trigrams, and negation detection. Classifiers tested are Naive Bayes and Maximum Entropy, with experiments comparing single-step (direct classification) and two-step (subjective/objective then positive/negative) classification approaches.

Quick Start & Requirements

  • Install/Run: Not explicitly detailed in the README. Assumes Python environment with standard NLP libraries.
  • Prerequisites: Python, likely NLTK or similar for NLP tasks. No specific version or hardware requirements mentioned.
  • Resources: Datasets are mentioned (Twitter Sentiment Corpus, Stanford Twitter Corpus), implying download and potentially significant storage for large-scale analysis.
  • Links:
    • Video: [Click here to see a video about this work](Click here to see a video about this work)
    • Presentation: [Click here to see an introductory presentation given during a rudimentary stage of this project](Click here to see an introductory presentation given during a rudimentary stage of this project)
    • Commercial API: https://www.onepanel.io/algorithms/twitter-sentiment-analyzer.html

Highlighted Details

  • Achieved a best accuracy of 86.68% using Naive Bayes with Unigrams + Bigrams + Trigrams.
  • Demonstrates that negation detection and higher-order n-grams improve accuracy.
  • Compares single-step vs. two-step classification, finding single-step generally outperforms.
  • Explores preprocessing techniques specific to Twitter's unique language and structure.
  • Investigates the impact of feature sets on Naive Bayes and Maximum Entropy classifiers.

Maintenance & Community

The project author, Ayush Pareek, has sold the project to OnePanel Inc., which offers it as a commercial API. The code remains publicly hosted for the open-source community. No specific community channels (Discord, Slack) or active development signals are mentioned.

Licensing & Compatibility

The README does not explicitly state a license. Given the public hosting and research nature, it's likely permissive, but this requires verification. Compatibility for commercial use would depend on the specific license.

Limitations & Caveats

The README does not detail specific limitations or known bugs. The project appears to be research-focused, and the implementation details for running the code are not fully elaborated, suggesting it may require significant effort to set up and reproduce results. The best accuracy achieved is 86.68%, indicating room for improvement compared to state-of-the-art models.

Health Check
Last commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.