代码代写|ECON6087 2023 Spring Assignment 1

这是一篇来自香港的关于使用19年来在澳大利亚新闻来源ABC相关数据来完成以下任务的代码代写

 

For assignment 1, we will use a new corpus, “A Million News Headlines” Corpus, covering all the news headlines published on the Australian news source ABC (Australian Broadcasting Corporation, http://www.abc.net.au) over a period of 19 years. The data can be accessed from the following Kaggle page https://www.kaggle.com/datasets/therohk/million-headlines.

You may also learn more details about this dataset and even found some coding examples from the same page. Please use this data to finish the following tasks:

  1. Train word embeddings using word2vec on this corpus, and perform a sentiment analysis based on the word embeddings and the “positivity” vector. We construct this vector based on the same way as Luca Bellodi (2022):

−−−−−−→ positivity =−−−−−→ success +−−→good +−−−→ happy +−−−−−→ perfect + +−−−−−−−→ important +−−−→ worth +−−→rich −−−−−→ f ailure −→bad −→sad −−−−−→ terrible −→bad −−−−→ regret −−→poor

  • Use the appropriate pre-processing steps that you feel fit;
  • Decide on the size of dimensions, number of iterations, and which model you

would like to train;

  • Choose a reasonable distance (or similarities) measure;
  • Find a reasonable way to aggregate the sentiment scores for each word to the document level.
  1. Plot the article-level sentiment scores by year-month.
  2. Try to construct sentiment scores toward different countries or international organizations, such as “US”, “UK”, and “Russia”, “Iran”, “NATO”, and “UN”.