Statistics and Syntax
Tasks
- Given a sequence of words, say, a sentence, what is the most
likely sequence of syntactic categories to be associated with the
words?
Independence assumptions:
- The probability of a syntactic category in a particular position
is dependent only on the n previous syntactic categories
(n usually 2 or 3).
- The probability of a word, given a syntactic category in a
particular position, is independent of the probability of words in
neighboring positions.
- Given a sentence, what are the probabilities of different
syntactic categories for each of the words?
Independence assumptions: same for 1.
- Given a sentence, what is the most likely parse?
(This makes use of 2.) Using this information, more efficient and
more accurate (best-first) parsing algorithms can be developed.
Independence assumptions:
- Simplest version: the probability of a given grammar rule
is independent of the probability of the rules used for its
constituents.
- Better version: the probability of a given grammar rule
depends on its head (or its first constituent).
Statistical Solutions
- Tasks 1 and 2
- Given the independence assumptions, 1 and 2 translate into
a concern with, for each position, the product of the probability of
the word, given a particular category, and the probability of that
category, given one (or more) previous category
- The string of words is seen as generated by a hidden Markov
model in which there
is a category for each word.
In the model there are transition probabilities associating one
category with the next and output probabilities associating categories
with words.
- For 1, the problem is to find the model which is most likely to
generate the string of words.
For 2, the problem is to find the total of the probabilities of all of
the models up to a particular word for a given category.
- The Viterbi algorithm permits the calculation of 1
by considering only the best path into each category+position node
(for example, V in the second position) for the given output string.
- A variant of the Viterbi algorithm permits the calculation of 2
by considering the total probability of all paths into each
category+position node.
- Task 3
- Parsing with probabilities: the probability associated with a
given rule is the product of the
probability of the rule, given the left-hand side
(and the head or first constituent), and the probabilities of the
constituents.
Michael Gasser
Last modified: Wed Nov 12 00:18:11 EST