Attached, please find a dataset with 17K speeches in German. Use encoding="iso-8859-1" to open the file.
The column "text" contains the speech (the other columns are metadata).
Speeches are uniquely identified by "speech_id".
First, split the sentences in "text" using the SPACY sentence splitter ([login to view URL]). Save the splitted sentences (keep the "speech_id", so that we can merge back the sentences to the metadata).
Second, get semantic role labeling (SRL) annotations for every sentence using this labeler: [login to view URL] Please figure out how we can use this labeler in Python. Maybe we have to build a scraper-like application that sends requests to their page. In this case, please be VERY MINDFUL of their page (e.g., add time between the requests).
Then, for every sentence, save the annotation (again keeping the "speech_id").
- Python script(s) performing the tasks above.
- .csv with splitted sentences
- SRL annotations for every sentence (either in .csv or separate text files)
11 freelanceria on tarjonnut keskimäärin $161 tähän työhön
Hi, I am Samyak.I have good command on python and deep learning with certifcation from udacity and four year experience. I will do your work with perfection. You can see my profile