vaderSentiment by cjhutto

Sentiment analysis tool attuned to social media texts

Created 11 years ago

4,940 stars

Top 10.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Travis Fischer

Founder of Agentic

Luis Capelo

Cofounder of Lightning AI

Jason Liu

Author of Instructor

Project Summary

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool specifically designed for social media text, but effective across domains. It provides a compound score for overall sentiment intensity and separate positive, neutral, and negative proportions, making it suitable for researchers and developers needing nuanced sentiment analysis.

How It Works

VADER employs a "gold-standard" sentiment lexicon of over 7,500 features, empirically validated by human raters for polarity and intensity. Its rule-based approach accounts for linguistic nuances like negations, intensifiers (e.g., "very"), punctuation, capitalization, slang, emoticons, and emojis, which are crucial for accurately interpreting social media language. This allows for a more sophisticated analysis than simple bag-of-words models, with a time complexity improved to O(N).

Quick Start & Requirements

Install via pip: pip install vaderSentiment
Requires Python 3.
Full demo requires NLTK and requests.
Official documentation and demo: GitHub Repository

Highlighted Details

Specifically attuned to social media sentiment, handling slang, emoticons, and capitalization.
Rule-based system accounts for negations, intensifiers, and punctuation for nuanced scoring.
Provides a normalized compound score (-1 to +1) and proportion scores (pos, neu, neg).
Includes extensive datasets and ground truth for various text domains (tweets, news editorials, movie reviews, Amazon reviews).

Maintenance & Community

The project has seen contributions from George Berry, Ewan Klein, and Pierpaolo Pantone. It is integrated into NLTK. Ports to Java, JavaScript, PHP, Scala, C, Rust, Go, and R are available.

Licensing & Compatibility

Licensed under the MIT License, allowing for broad use and commercial compatibility.

Limitations & Caveats

The demo's non-English text analysis relies on an external translation service with usage limits. The pos, neu, and neg scores do not account for the rule-based enhancements, only raw categorization of lexical items.

Health Check

Last Commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

20 stars in the last 30 days