ML - Week 11 - Theoretical Exercises

Hidden Markov Models

Questions to slides Hidden Markov Models - Training:

  • Explain why training by counting as explained on slides 3--9 yields a maximum likelihood estimate, i.e. why the parameters $\Theta = (A_{jk}, \pi_k, \phi_{ik})$ as computed from a given ${\bf X}$ and ${\bf Z }$ cf. slide 9 maximizes the joint probability $P({\bf X}, {\bf Z} | \Theta)$.

  • Consider Viterbi training as explained on slides 18-19. If a parameter in the initial model ${\bf \Theta^0}$ is set to zero, i.e. if a particular transition or emission probability is set to zero, then it will remain zero during all the iterations of Viterbi training (if we do not perform pseudo counts). Why?

  • Explain why you can stop Viterbi training if the Viterbi decoding does not change between two iterations?

  • Consider EM for HMMs (Baum-Welch training as outlined on slides 30 and 46. It also has the property that if a parameter in the initial model ${\bf \Theta^0}$ is set to zero, i.e. if a particular transition or emission probability is set to zero, then it will remain zero during all the iterations of the EM training. Why?