- Assume each S word is translation of a single T word (or the NULL word).
-
Given a word alignment A and word translation probabilities φ(sj|ti)
P(S|T,A) = Πj φ(sj|taj)
-
The probability of choosing an S length J and choosing one of the alignments with T of length I, assuming equal likelihood of all alignments:
P(A|T) = ε/(I + 1)J
-
The probability of generating S via a particular alignment:
P(S,A|T) = P(S|T,A) × P(A|T) = ε/(I + 1)J Πj φ(sj|taj)
-
Summing over all alignments, we get the total probability of generating S:
P(S|T) = ΣA P(S|T,A) × P(A|T) = ΣA ε/(I + 1)J Πj φ(sj|taj)
-
Best alignment Â:
 = argmaxA P(S,A|T) = argmaxaj φ(sj|taj), 1 < j < J
- Training
- Other IBM models
- Fertility: number of source words corresponding to a target word
- Parameters:
- word-word translation probabilities (as before)
- n(2|the): probability that the has fertility 2
- d(2,4,6,7): probability that target word 2 aligns with source word 4 when the lengths of target and source sentences are 6 and 7
- p1: probability of generating a spurious source word (corresponding to NULL) each time a source word is generated corresponding to a target word