Using the GUM treebank from here: [login to view URL]
The HMMs are well described here in the chapter 8.4. Link here: [login to view URL]~jurafsky/slp3/[login to view URL]
Components of a HMM tagger (40 points) [For everybody]
Undergrads and graduates: Use the equation in 8.4.3 to implement the emission and transition probabilities. Check equation 3.23 in chapter 3 in the book for implementing both the transition and emission probabilities if you want to add smoothing. Don't forget to add the <s> token when computing the transition probabilities.
Greedy Tagger (60 points) [For everybody]
Implement a greedy tagger. At each step, choose the tag that is the best. You don't have to implement the Viterbi algorithm to find the best tag sequence. At each step, select the tag that is the maximum of the product of the transition probability and the emission probability. Think greedy!
Viterbi Tagger (50 points) [For extra credit]
Implement the Viterbi tagger as given in 8.4.5. The backpointer part needs to be implemented for outputting the best sequence.
Reading: Read the section A.4 for worked out examples of the viterbi algorithm.
Don't hesitate to contact me for doubts about your code. Best of luck.
Test your tagger on the test dataset here: [login to view URL]
What is the accuracy and F-scores of your tagger? You can use sklearn's metrics to compute the metrics.