Sentiment analysis
A tutorial by Bing Liu on sentiment analysis and other aspects of opinion mining
- Tasks
- Distinguishing subjective and objective language
- Determining whether an opinion is positive, negative, or neutral
- Determining whether the writer believes a proposition
- (Determining whether a proposition is true)
- Some datasets
- A subjective text
- Components of an opinion: opinion holder, object, opinion itself
- Object
- Set of features F, each with an associated synonym set (synset) W
- F and W may be unknown
- Structure of the text: opinion holder selects subset of features, refers to each with a word/phrase from the associated synset, expresses a positive, negative, or neutral opinion of each
- Unsupervised document-level sentiment classification (Turney)
- Extract two-word phrases containing adjectives or adverbs from POS-tagged corpus
- Calculate semantic orientation of phrases
- Based on pointwise mutual information of phrase and the words "excellent" and "poor"
-
PMI(w1, w2) = log2 [p(w1 & w2) / p(w1) p(w2)]
SO(phrase) = PMI(phrase, "excellent") - PMI(phrase, "poor")
- PMI estimated using AltaVista NEAR (within 10 words) queries
SO(phrase) = log2 [(hits(phrase NEAR "excellent") hits("poor")) / (hits(phrase NEAR "poor") hits("excellent"))]
- SO examples: online service: 2.78, small part: 0.053, inconveniently located: -1.54 unethical practices: -8.48
-
Review is positive ("recommended") if average SO of phrases in it is positive, negative ("not recommended") otherwise
-
Accuracy of 74% across four review domains: cars, banks, movies (lowest), travel destinations
- Supervised document-level sentiment classification (Pang, Lee, and Vaithyanathan)
- Starred movie reviews from IMDb
- Document represented as bag of features
- Three classifiers tried: naive Bayes, maximum entropy, support vector machines
- Features: unigrams (with NOT tag added where negation found), bigrams
- Some results
- SVM superior (83% for unigrams)
- Unigrams superior to bigrams
- Worse results when frequency of features taken into account
- No improvement with POS attached, restriction to adjectives, positional indication
- Difficulty of "thwarted expectations" narrative
- Sentence-level sentiment classification (Ding and Liu)
- Assign positive, negative, or neutral evaluation to an object feature
- Context-independent "opinion words": poor, great, etc., each with an assigned orientation (1, -1, 0)
- Split sentences at BUT words
- In each sentence segment out portion mentioning feature; sum opinion word orientations weighted by distance from feature word
- Context-dependent opinion words (the battery life is very long)
- Infer orientation based on co-occurrence with context-independent opinion words: this camera takes great pictures and has a long battery life
- Unless there is a BUT word:
this camera takes poor pictures but has a long battery life