In the last chapter, we derived the Viterbi algorithm as a way of implementing the principle of maximum likelihood for parsing problem for HMMs and FSAs. In this chapter, we turn to the Bayesian formulation of this problem. In the Bayesian setting we wish to give some representation for the entire posterior distribution over sequence of categories given the observed sentence, rather than finding a single point estimate for the probability of the sentence.

Recall the definition of conditional probability from Independence, Marginalization, and Conditioning.

\[\Pr(H=h \mid D=d)=\frac{\Pr(D=d, H=h)}{\sum_{h'\in H} \Pr(D=d H=h')}=\frac{\Pr(D=d ,H=h)}{\Pr(D)}\]

In the case of the HMM/FSA this becomes:

\[\Pr(c^{(1)},\cdots,c^{(k)} \mid w^{(1)}, \cdots, w^{(k)})=\frac{\Pr(w^{(1)}, \cdots, w^{(k)}, c^{(1)},\cdots,c^{(k)})}{\sum_{\bar{c}^{(1)},\cdots,\bar{c}^{\prime (k)}} \Pr(w^{(i)},\cdots, w^{(i)}, \bar{c}^{(1)},\cdots,\bar{c}^{(k)})}\]

Substituting in the expressions capturing the conditional independence assumptions of the model, we define our desired posterior distribution over categories.

\[\Pr(c^{(1)},\cdots,c^{(k)} \mid w^{(1)}, \cdots, w^{(k)})=\frac{\prod_{i=1}^k \Pr(w^{(i)} \mid c^{(i)}) \Pr(c^{(i)} \mid c^{(i-1)})}{\sum_{\bar{c}^{\prime (1)},\cdots,\bar{c}^{\prime (k)}}\prod_{i=1}^k \Pr(w^{(i)} \mid \bar{c}^{(i)}) \Pr(\bar{c}^{(i)^\prime} \mid \bar{c}^{(i-1)})}\]

As we have seen several times now with this class of models, computing such posterior probabilities in the naive way is intractable. In the next lectures, we will examine techniques for handling this.


31 The Viterbi Algorithm 33 Exact Sampling and Scoring