Word sense disambiguation
The problem
- An example
- For some four hundred years, suits of matching coat, trousers, and waistcoat have been in and out of fashion.
- The differences between European decks are mostly in the number of cards in each suit.
- However, courts typically have some power to separate out claims and parties into separate suits if it is more efficient to do so.
- What's a sense?
- Lexical sample and all-words tasks
Supervised methods
- Corpora for lexical sample and all-words tasks
- Extracting feature vectors
- Collocational features
- Bag-of-word features
-
Classifiers
- Naive Bayes
-
ŝ = argmaxs ε S P (s|f)
ŝ = argmaxs ε S P (f|s) P (s) / P (f)
- Assumption of conditional independence of features given sense
- ŝ = argmaxs ε S P (s) Πj P (fj|s)
- Decision list
- An ordered list of tests, one for each feature
- The discriminability of a feature: ratio of log-likelihoods of different senses
| log(P (s1|f) / P (s2|f)) |
- Tests are ordered by discriminability
Dictionary and thesaurus methods
- The Lesk algorithm
- Choose the sense whose "signature" (dictionary definition and gloss, for example) shares the most words with the target word's neighborhood (ignoring words on a stop list)
-
Example: WordNet glosses and examples for three senses of suit:
-
a set of garments (usually including a jacket and trousers or skirt) for outerwear all of the same fabric and color; "they buried him in his best suit"
-
a comprehensive term for any proceeding in a court of law whereby an individual seeks a legal remedy; "the family brought suit against the landlord"
-
playing card in any of four sets of 13 cards in a pack; each set has its own symbol and color; "a flush is five cards in the same suit"; "in bridge you must follow suit"; "what suit is trumps?"
- Corpus variant of Lesk algorithm: given a sense-tagged corpus, add all words from the corpus in sentences with the sense to the sense's signature, weighting the words by their inverse document frequency
- Selectional association (Resnik)
- Using selectional restrictions to choose the sense of a verb's argument
- She washed the suit.
- She filed the suit.
- She remembered the suit.
- Selectional preference: amount of information a predicate provides about the class of its arguments
- Difference in information between P(c) and P (c|v) for an argument class c and verb v
- Uses relative entropy between the two distributions
SR(v) = Σc P (c|v) log P (c|v) / P (c)
- Selectional association:
AR(v,c) = 1/SR(v) [P (c|v) log P (c|v) / P (c)]
-
Choose the sense of the verb's argument that has the highest selectional association between one of its hypernyms and the verb.
Bootstrapping (Yarowsky algorithm)
- Task: learn a sense classifier for a word, given a small set of labeled examples
- Train a decision-list classifier on the labeled set.
- Use the classifier to label the unlabeled corpus.
- Remove the examples from the unlabeled corpus that you are most confident about, add them to the labeled corpus, and retrain.
- Repeat until some threshold is reached.
- Heuristics for generating initial training set: One Sense per Collocation, One Sense per Discourse