executing a NLP code with python( Document classification)
₹600-1500 INR
Suoritettu
Julkaistu lähes 3 vuotta sitten
₹600-1500 INR
Maksettu toimituksen yhteydessä
For this problem, you will work a lot with classifying text in chapter 6. You will build a Naive Bayes Classifier from scratch for the task of classifying if an email is spam or not. You will get hands-on experience on how to build a machine learning classifier using NLTK. You can submit a python notebook file for this homework. The answers can be submitted separately in a [login to view URL]’t hesitate to GOOGLE. But, don’t copy the code. .[login to view URL] section 1 in NLTK chapter 6 and familiarize with the document classification example formovie reviews [login to view URL] the dataset from ACL Wiki [login to view URL] There are many Spam datasets. Untarthe dataset. Google untar and find out how you will deal with a non-zip type [login to view URL] read the readme file. [login to view URL] many folders are there in the archive? [login to view URL] is the difference between the different folders?[login to view URL] will work with Part 1 folder in the lemm_stop folder. Show the code snippet to get marks for this [login to view URL] many documents are marked as spam and not spam? How did you come up with the number?[login to view URL] many words are there in all the documents? [login to view URL] are the top 5 frequent words in the spam documents?[login to view URL] are the top 5 frequent words in the non-spam documents?[login to view URL] is the maximum number of words in a document?[login to view URL] is the minimum number of words in a document?[login to view URL] a feature extractor function similar to document_features in the NLTK example. Don’t copy the code from NLTK book. Use the feature extractor function to create a training dataset on Part 1 of the data. Train a Naive Bayes classifier as shown in the book [login to view URL] testing, we will use Part 10 in the lemm_stop folder. Follow similar steps as above to create a test dataset. Apply the feature extractor function to extract features from the test dataset. What is its accuracy on the test dataset? Show the code. What happens if you test on the training dataset? If you get accuracies below 50%, then, there is a bug in your [login to view URL]: What is the Precision, Recall, and F-score of the classifier that you trained? Read section 3 of the chapter to answer these [login to view URL] you try another classifier such as Logistic Regression? How do the evaluation metrics looklike? This is a good starting point to start using [login to view URL] can use scikit-learn’s implementation: [login to view URL] example of working with text data is here: [login to view URL]