In the last unit, we saw how to use the conditional independence assumptions of HMMs to efficiently compute the marginal score of a string based on the decomposition of forward probabilities. Recall that a the forward probability is defined as the probability of generating the first \(i\) words of a string and stopping in some particular state \(c^{(i+1)}\), that is,

\[\mathbf{fw}_i(c^{(i+1)}) = \Pr(w^{(1)},\dots,w^{(i)},c^{(i+1)})\]

This marginal probability can be computed recursively by traversing the trellis from left to right, computing the forward values at each step, in turn. If the length of our string is \(|\vec{w}|\), then

\[\mathbf{fw}_{|\vec{w}|}(\ltimes) = \Pr(w^{(1)},\dots,w^{(|\vec{w}|)},C^{(|\vec{w}|+1)}=\ltimes)\]

is the marginal probability of our whole string under the HMM.

This, however, is not the only way to efficiently compute the marginal probability of a string under an HMM. Consider the following quantity:

\[\mathbf{bk}_{k}(c^{(k-1)}) = \Pr(w^{(k)}, \dots, w^{(|\vec{w}|)} \mid c^{(k-1)})\]

This quantity can be thought of as the “reverse” of the forward probability. It measures the probability that, given we start in some state, we generate the remainder of the string (and stop in an accepting state). This is known as the backwards probability, and it can be decomposed recursively in a manner that is analogous to the forward probability.


25 The Forward Algorithm 27 Finite-State Automata