Measures of the "goodness" of a sentence (or other text)
Is it grammatical? Can it be parsed?
Does it make sense? Does it obey the selectional restrictions of the words?
Does it conform to a model representing the knowledge of a particular domain?
Is it relatively likely?
The probability of a sequence of words
N-grams
The joint probability of a sentence (using the chain rule):
Smoothing: moving some probability mass from the frequent to the infrequent N-grams
Discount approach: subtracting a small fixed amount from each count, except for 0 and 1 cases
Back-off: basing the estimate for a non-occurring sequence on the estimate for the next shorter sequence (and even shorter sequences if necessary):
basing P(wn|wn-2,wn-1) on estimates of P(wn|wn-1) or P(wn)
Interpolation: basing the estimate for each sequence on the weighted sum of the estimates for the sequence and shorter subsequences:
basing P(wn|wn-2,wn-1) on a weighted sum of estimates of P(wn|wn-2,wn-1), P(wn|wn-1) and P(wn)
Kneser-Ney discounting: estimate of P(wn|wn-1) is discounted estimate of P(wn|wn-1) plus an expression proportional to the number of different contexts that wn occurs in (a better measure than the overall count)
Using log probabilities
Evaluating language models
Different language models trained on a training set
Perplexity of a test set W for a given trained language model:
PP(W) = P(w1, w2, ..., wN)-1/N
Language models for morphologically complex languages
Morphological parsing and stemming
Multiple, overlapping morphological N-grams for each word sequence