Attached (SAMPLE) of a raw data file that consists of the conversations of a Telegram group (for one month) in two formats: HTML and JSN (you can use whatever helps you). The project is for conversations from Jan 2020-Jan 2021 (So it will be more than one file, that I will send via email because of its size)
The language of the conversations is in ARABIC. The group was established in 2018, but we are interested in the period of ( Jan. 2020 - Jan. 2021)
I need someone to perform the following statistics for the period from Jan2020-Jan2021 and provide me with the results along with the source code of the program used :
1. The most 10 active members in the group in order of the number of posts along with The number of posts during the year for each member.
2. The 100 less active members in the group in order of the number of posts along with The number of posts during the year for each member.
3. The rate of group participation as a whole during the year:
In other words, the number of posts/ month for the whole group, any trends or patterns that may appear.
4. The percentage of participation of the most active members (Refer to #1) in relation to the participation of the group as a whole during the year (percentage per month).
5. A list of members who joined the group after Jan. 2020, number of posts per month, along with any trends or patterns in the participation rate. ( identified by unique user identifier that allows for easy post’s retrieval “ Refer to #7” )
6. For the new members ( Refer to #5) count their participation rate during the year by (the number of their posts/ week.) I want to observe whether their participation rates increase or decrease over time. ( User Participation Trends/Patterns)
7. Identify users with a unique identifier and have the ability to retrieve posts for a specific user for a specific month or the whole year. Specifically for the most active and the new members.
For deliverables /Milestone1:
-the results for each requirement above, in excel and csv format along with graphs.
-the source files for the code, and the algorithms used preferably using Python.