vaderSentiment  by cjhutto

Sentiment analysis tool attuned to social media texts

created 10 years ago
4,775 stars

Top 10.6% on sourcepulse

GitHubView on GitHub
Project Summary

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool specifically designed for social media text, but effective across domains. It provides a compound score for overall sentiment intensity and separate positive, neutral, and negative proportions, making it suitable for researchers and developers needing nuanced sentiment analysis.

How It Works

VADER employs a "gold-standard" sentiment lexicon of over 7,500 features, empirically validated by human raters for polarity and intensity. Its rule-based approach accounts for linguistic nuances like negations, intensifiers (e.g., "very"), punctuation, capitalization, slang, emoticons, and emojis, which are crucial for accurately interpreting social media language. This allows for a more sophisticated analysis than simple bag-of-words models, with a time complexity improved to O(N).

Quick Start & Requirements

  • Install via pip: pip install vaderSentiment
  • Requires Python 3.
  • Full demo requires NLTK and requests.
  • Official documentation and demo: GitHub Repository

Highlighted Details

  • Specifically attuned to social media sentiment, handling slang, emoticons, and capitalization.
  • Rule-based system accounts for negations, intensifiers, and punctuation for nuanced scoring.
  • Provides a normalized compound score (-1 to +1) and proportion scores (pos, neu, neg).
  • Includes extensive datasets and ground truth for various text domains (tweets, news editorials, movie reviews, Amazon reviews).

Maintenance & Community

The project has seen contributions from George Berry, Ewan Klein, and Pierpaolo Pantone. It is integrated into NLTK. Ports to Java, JavaScript, PHP, Scala, C, Rust, Go, and R are available.

Licensing & Compatibility

Licensed under the MIT License, allowing for broad use and commercial compatibility.

Limitations & Caveats

The demo's non-English text analysis relies on an external translation service with usage limits. The pos, neu, and neg scores do not account for the rule-based enhancements, only raw categorization of lexical items.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
107 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.