Python代写 | COMP90042 Project 2021: Rumour Detection and Analysis on Twitter

本次澳洲代写主要为Python NLP谣言检测相关

The concept of rumour has a long history, and it is typically defined as an unverified statement or news circulat-
ing from person to person. Rumours have the potential to spread quickly through social media, and bring about
significant economical and social impact. The figure below illustrates an example of a rumour propagating on
Twitter. The source message (green box) started a claim about the cause of Michael Brown’s shooting, and it was
published shortly after the shooting happened. It claimed that he was shot ten times by the police for stealing
candy. The message was retweeted by multiple users on Twitter, and within 24 hours there were about 900K
users involved, either by reposting, commenting, or questioning the original source message. From the replies
we can see some users (e.g. User 7; red box) questioned the truthfulness of the original message.

The challenge of the project is to develop a rumour detection system and analyse the nature of rumours that are
being propagated on Twitter. We will frame this using two tasks: rumour detection and rumour analysis.

Task 1: Rumour Detection
In this task, you will be provided with a set of source tweets and their replies (i.e. the comments we saw
in the figure above), and each source tweet is labelled as either a rumour or non-rumour. The task here
is to build a binary classifier using this dataset. For each tweet (source tweet or reply tweet), the dataset
provides a range of information, including the text of the tweet, information of the user who made the
tweet, unique ID of the tweet, etc (more information will be provided in the “Datasets” section below).
You’re free to explore any methods or machine learning models for building the binary classifier. To give
some ideas, we could model the source tweet and replies as a sequence of tweets using recurrent networks.
Alternatively we could also model them based on their propagation structure (like the tree structure of

comments we saw earlier) using recursive networks or graph networks. We might want to consider incor-
porating some user information, as it could provide hints to the trustworthiness of a user. While you are
permitted to use pretrained models or embeddings, you should only use the provided dataset for training
the model, i.e. you should not crawl or search for more training data. Whatever methods or features you
use, you must at least incorporate the tweet text in your model (we are doing an NLP project, after all).

Task 2: Rumour Analysis
In this task, you will use your trained rumour classifier from the first task and apply it a set of provided
COVID-19 tweets to detect rumours. Given the predicted rumours and non-rumours, the aim here is to
perform some analyses to understand the nature COVID-19 rumours and how they differ to their non-
rumour counterparts. Below are some questions to get you started:
• What are the topics of COVID-19 rumours? How do they differ from the non-rumours? How do these
topics evolve over time?
• What are the popular hashtags of COVID-19 rumours and non-rumours? How much overlap or dif-
ference do they share?
• Do rumour source tweets convey a different sentiment/emotion to the non-rumour source tweets?
What about their replies?
• What are the characteristics of rumour-creating users, and how do they differ compared to normal
users?
Note that these are just some suggestions, and you are free to explore any questions to understand COVID-
19 rumours on Twitter. As before, while you are free to do any analyses that may not be text-related (e.g.
propagation analysis), most of your analyses should involve the text of the tweets. You should do your
analyses using only the provided data.