This is an Python project.
your task is actually trivial (theoretically). In the below link, you are provided with a relatively large newspaper articles dataset called “radikal”. The dataset contains articles published in the newspaper Radikal during 2009. Although for this project we will only use the file [login to view URL], which is a slightly formatted and tidy version of the articles collection, I am also sharing the original dataset; consisting of many folders and each article stored as a separate file in these folder. These files also contain several html tags, since the dataset is downloaded from the Internet. (Again, remember that the raw collection is just shared so that you can relate to the work required to convert raw data to the collection format. In this project you are expected to work on .corpus file only.)