{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ML E2020 - Week 11 - Theoretical Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hidden Markov Models\n",
    "\n",
    "***Exercise 1***: Questions to slides *Hidden Markov Models - Training*:\n",
    "\n",
    "1. Consider the simple \"weather-HMM\" with a transition diagram as shown on slide 3. Assume that we do not know the model parameters i.e. the start-, transition-, and emission-probabilities, but that we are given two pairs of $({\\bf X}, {\\bf Z})$ as training data. \n",
    "\n",
    "    These pairs are: (`HHLLLHHHHLLLLHH`, `SSSRRSSSRRRRSSS`) and (`LLHHLLLHHHHLLHH`, `RRRSSRRRSSSRRRS`), where H and L are the two states of the model, and S and R are the two emissions sunshine and rain. \n",
    "    \n",
    "    Use Training-by-Counting to set the model parameters according to this training data.\n",
    " \n",
    "2. Consider Viterbi training as explained on slides 18-19. If a parameter in the initial model ${\\bf \\Theta^0}$ is set to zero, i.e. if a particular transition or emission probability is set to zero, then it will remain zero during all the iterations of Viterbi training (if we do not perform pseudo counts). Why? \n",
    "\n",
    "3. Explain why you can stop Viterbi training if the Viterbi decoding does not change between two iterations? \n",
    "\n",
    "4. Consider EM for HMMs (Baum-Welch training as outlined on slides 32 and 49. It also has the property that if a parameter in the initial model ${\\bf \\Theta^0}$ is set to zero, i.e. if a particular transition or emission probability is set to zero, then it will remain zero during all the iterations of the EM training. Why? \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***Exercise 2***: Questions to slides *Hidden Markov Models - Selecting the initial model parameters and using HMMs for (simpel) gene finding*:\n",
    "\n",
    "1. Consider the 7-state HMM on slides 26 that you also use in pratical exercises. As stated on slide 27, this HMM is also relevant for gene finding, where we say that state 3 emits non-coding symbols, states 2, 1, 0 emit coding triplets (codons) in the left-to-right direction and states 4, 5, 6 emit coding symbols in the reverse (right-to-left) direction. \n",
    "\n",
    "    If we are given a DNA string, say \n",
    "    \n",
    "    `ACGTATGCTAATCTAAACCTACGGCATGT`\n",
    "    \n",
    "    and information about its gene structure using the N, C, R annotation also used in the slides and practical exercises, say \n",
    "    \n",
    "    `NNNNCCCCCCCCCCCCNNRRRRRRRRRNN`\n",
    "\n",
    "    then we can convert this gene structure into an actual sequence of states, as also explained on slide 30 (for a different model), as \n",
    "    \n",
    "    `33332102102102103345645645633`\n",
    "    \n",
    "    Use the above DNA string and information about its gene structure to set the model parameters of the 7-state HMM using Traning-by-Counting. (You can perhaps use this small example as a test case for your implementation of Traning-by-Counting in the practical exercises.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}