Questions to slides Hidden Markov Models - Training:
Explain why training by counting as explained on slides 3--9 yields a maximum likelihood estimate, i.e. why the parameters $\Theta = (A_{jk}, \pi_k, \phi_{ik})$ as computed from a given ${\bf X}$ and ${\bf Z }$ cf. slide 9 maximizes the joint probability $P({\bf X}, {\bf Z} | \Theta)$.
Consider Viterbi training as explained on slides 18-19. If a parameter in the initial model ${\bf \Theta^0}$ is set to zero, i.e. if a particular transition or emission probability is set to zero, then it will remain zero during all the iterations of Viterbi training (if we do not perform pseudo counts). Why?
Explain why you can stop Viterbi training if the Viterbi decoding does not change between two iterations?
Consider EM for HMMs (Baum-Welch training as outlined on slides 30 and 46. It also has the property that if a parameter in the initial model ${\bf \Theta^0}$ is set to zero, i.e. if a particular transition or emission probability is set to zero, then it will remain zero during all the iterations of the EM training. Why?