Statistical and knowledge-based methods (1)
- What are statistical methods?
- The knowledge in the system is based large amounts of data.
- The system learns the knowledge.
- The knowledge may be procedural, as well as declarative.
- What is learned is relatively unstructured.
- What are knowledge-based methods?
- The knowledge in the system is based on a theory of language or language processing.
- The system is programmed to have the knowledge.
- The knowledge is almost always in a declarative form.
- The knowledge may be highly structured.
Statistical and knowledge-based methods (2)
- What makes something difficult or inappropriate for statistical methods?
- It's not really a statistical phenomenon; the rules are all-or-none and inferrable from a small number of examples.
- There's not enough data to learn it.
- What makes something difficult or inappropriate for knowledge-based methods?
- It's only really a tendency, not all-or-none.
- People aren't aware of it, or it's difficult to put into words.
- It's very complex, involving many exceptions and context-specific rules.
Statistical methods
- n-gram models
- Predict the next unit (usually word) in a sequence from the previous n-1 units;
assign a probability to a sequence of units.
- Language model: assigns a probability to a sequence of words in a particular language.
- Classifiers
- Given some input (values for a set of features), assign it to one of a finite set of classes (perhaps with a probability).
- Naive Bayes classifiers: probability of a particular class given a vector of features
is a function of the probability of the class, the probability of the features, and the
probability of the features given the class.
`p(c|f_1, f_2, ..., f_n) = (p(f_1, f_2, ..., f_n|c) p(c)) / (p(f_1, f_2, ..., f_n))`
- Decision list classifiers: a sequence of tests is applied to an input; when a test succeeds,
a class is returned; otherwise other tests in the last are applied in sequence.
- Support vector machines: construct hyperplanes that maximally separate classes in high-dimensional feature space
- Clustering (usually unsupervised, unlike classifiers)
- Probabilistic grammars: combine linguistic and statistical methods.
- Hidden Markov models: a sequence of observed units is viewed as resulting from a sequence of hidden causal states
- Noisy channel models
- An observed event is viewed as a noisy version of a hidden
event; separate the process into two factors (using Bayes' theorem)
- `hat H = text{argmax}_H\ p(O|H)\ p(H)`
- Log-linear models
- The probability of the hidden event (given the observed event)
is viewed as a function of a log-linear combination of weighted values for a set of feature functions of the hidden
and observed events
-
`hat H = text{argmax}_H\ text{exp} [sum_m^M lambda_m h_m(H,O)]`
Linguistic methods (knowledge)
- Rich lexicons
- Grammars
- (Relative position of word with respect to other words or POS tags)
- Phrase-structure grammars
- Dependency grammars
- Models of discourse
Some tasks and kinds of knowledge
- Word sense disambiguation
- Morphological analysis and generation, including morphophonology
- Parsing
- Generation
- POS tagging
- Spell checking
- Word or phrase translation "equivalents"
- Anaphora resolution
- Segmentation (speech, text)
- Learner modeling in intelligent tutoring
- Sentiment analysis
- Collocations
- Alignment of bitext corpora