Word sense disambiguation and semantic distance
- WSD: the problem
- An example
- For some four hundred years, suits of matching coat, trousers, and waistcoat have been in and out of fashion.
- The differences between European decks are mostly in the number of cards in each suit.
- However, courts typically have some power to separate out claims and parties into separate suits if it is more efficient to do so.
- SENSEVAL and SemEval competitions
- Lexical sample and all-words tasks
- SemCor: a sense-tagged corpus
- What senses? Wordnet synsets?
- Supervised methods
- Corpora for lexical sample and all-words tasks
- Extracting feature vectors
- Collocational features
- Bag-of-word features
-
Classifiers
- Naive Bayes
-
ŝ = argmaxs ε S P (s|f)
ŝ = argmaxs ε S P (f|s) P (s) / P (f)
- Assumption of conditional independence of features given sense
- ŝ = argmaxs ε S P (s) Πj P (fj|s)
- Decision list
- An ordered list of tests, one for each feature
- The discriminability of a feature: ratio of log-likelihoods of different senses
| log(P (s1|f) / P (s2|f)) |
- Tests are ordered by discriminability
- Semi-supervised dictionary and thesaurus methods
- The Lesk algorithm
- Choose the sense whose "signature" (dictionary definition and gloss, for example) shares the most words with the target word's neighborhood (ignoring words on a stop list)
-
Example: WordNet glosses and examples for three senses of suit:
-
a set of garments (usually including a jacket and trousers or skirt) for outerwear all of the same fabric and color; "they buried him in his best suit"
-
a comprehensive term for any proceeding in a court of law whereby an individual seeks a legal remedy; "the family brought suit against the landlord"
-
playing card in any of four sets of 13 cards in a pack; each set has its own symbol and color; "a flush is five cards in the same suit"; "in bridge you must follow suit"; "what suit is trumps?"
- Corpus variant of Lesk algorithm: given a sense-tagged corpus, add all words from the corpus in sentences with the sense to the sense's signature, weighting the words by their inverse document frequency
- Thesaurus distance measures
- Naive distance
- Information content of nodes
- Resnik distance: information content of the least common subsumer of the nodes
`text(sim)_(text(res))(c_1, c_2) = - log P(LCS(c_1, c_2))`
- Lin distance: "ratio between the amount of information needed to state the commonality of A and B
and the information needed to fully describe what A and B are"
`text(sim)_(text(lin))(c_1, c_2) = (2 log P(LCS(c_1, c_2))) / (log P(c_1) + log P(c_2))`
Evaluation
- Four kinds of events, comparing the behavior of the System and the Gold standard
- True positive: System identifies something that agrees with Gold
- True negative: System fails to identify something, agreeing with Gold
- False positive: System identifies something that Gold does not
- False negative: System fails to identify something that Gold does
- Four evaluation measures
- Accuracy: how well System performs in terms of all possible events
`A = (TP + TN) / (TP + TN + FP + FN)`
- Precision: how well System performs in terms of its own actions
`P = (TP) / (TP + FP)`
- Recall: how well System performs in terms of the actions it should have taken (Gold's actions)
`R = (TP) / (TP + FN)`
- F-measure: a weighted combination of precision and recall
`F_beta = ((beta^2 + 1)PR)/(beta^2 P + R)`
- Two tasks
- Segmentation
- Word sense disambiguation
Some concepts in McCarthy et al. (2007)
- Supervised and unsupervised word sense disambiguation
- Resources required
- Sense frequency and the first sense heuristic
- The inadequacy of SemCor
- "Distributional similarity" and "semantic similarity"
- "Features"
- Mutual information between a word w and a feature f
`I(w,f) = log [(P(f|w))/(P(f))]`
- Similarity measures: Lin, Lesk, Jiang and Conrath
- Effect of domain
- Differences between nouns, verbs, and adverbs
- Data sparsity and tagging projects
Some concepts in Turney (2006)
- Attributional and relational similarity
- Lexicon-based and corpus-based similarity measures
- Structure Mapping Theory
- Metaphors
- Noun-modifier pairs (noun-noun compounds)
- Vector space model in IR
- Entropy transformation of matrix
- Singular value decomposition
- Domain-specific and domain-general performance, data sparsity
Some concepts in Tsang and Stevenson (2010)
- Distributional and ontological methods revisited
- Ontologies
- Classifiers: support vector machines, etc.
- Statistical tests
- Minimum cost flow
- Semantic profiles
- Tasks
- Verb subcategorization frames
- Name disambiguation
- Document classification
- Profile coherence
- Noise and relevance
- Profile density
- Overall evaluation