I have basic code implementation of this task using bert model. For writing a research paper, they expect an in detail submission having proper analysis and comparisons.
I can share my tutorial sheets as well.
1. Each instance in data has multiple tweets. First tweet is source tweet, which is labelled and is followed by multiple tweets. I need to show difference in accuracy in the FINAL MODEL traaining with these 2 different input You can verify from dev.data.jsonl. Difference can be shown using COVID data for anaysis. This will be perfomed only on bert
[login to view URL] features are converted to either word2vec , or embeddings using texts_to_sequence of keras or BERT
We are supposed to compare accuracy using word_2_vec, text_seq using LR, FNN, RNN and LSTM
3. Finally tuning BERT model with FNN, LSTM and RNN and showing its accuracy on COVID data. This is right side of flow chart
Using final model, we can compare accuracy with and without user info as a feature.
SECTIONS I need to include in research paper for which I need your help in code:
Data Exploration : Showing difference in accuracy by training model using only first tweet vs all tweets of an instance. ( Different functions can be made for 2 types of pre-processing). This will be shown in the final model, the model which will give the best accuracy.)
# Please try making different sections in jupyter so that I can comment them later properly
Converting textual to input features: - here 2 types of input features: - comparing accuracy first using word2vec, second using word embeddings ( using text_to_sequence from keras and later from BERT)
Training different models for comparison of accuracy: FNN(with word embeddings using word2vec, text_to_sequence), LSTM, RNN and Logistic
Finally training FNN along fine tuning with BERT.
checking the accuracy of last 3 models, FNN , RNN and LSTM all using BERT on COVID data
DO SEE FLOWCHART TO GET ROUGH IDEA