Python代写 | CA Assignment 2 Data Clustering

本次英国代写主要为Python k-均值聚类算法相关

Assignment description
In the assignment, you are required to cluster words belonging to four categories: animals,
countries, fruits and veggies. The words are arranged into four different files that you will find in
the archive CA2data.zip. The first entry in each line is a word followed by 300 features (word
embedding) describing the meaning of that word.
Questions/Tasks
1. (25 marks) Implement the k-means clustering algorithm to cluster the instances into k
clusters.
2. (25 marks) Implement the k-medians clustering algorithm to cluster the instances into k
clusters.
3. (10 marks) Run the k-means clustering algorithm you implemented in part (1) to cluster
the given instances. Vary the value of k from 1 to 9 and compute the B-CUBED precision,
recall, and F-score for each set of clusters. Plot k in the horizontal axis and the B-CUBED
precision, recall and F-score in the vertical axis in the same plot.
4. (10 marks) Now re-run the k-means clustering algorithm you implemented in part (1)
but normalise each object (vector) to unit `2 length before clustering. Vary the value of
k from 1 to 9 and compute the B-CUBED precision, recall, and F-score for each set of
clusters. Plot k in the horizontal axis and the B-CUBED precision, recall and F-score in
the vertical axis in the same plot.
5. (10 marks) Run the k-medians clustering algorithm you implemented in part (2) over
the unnormalised objects. Vary the value of k from 1 to 9 and compute the B-CUBED
precision, recall, and F-score for each set of clusters. Plot k in the horizontal axis and the
B-CUBED precision, recall and F-score in the vertical axis in the same plot.
6. (10 marks) Now re-run the k-medians clustering algorithm you implemented in part (2)
but normalise each object (vector) to unit `2 length before clustering. Vary the value of
k from 1 to 9 and compute the B-CUBED precision, recall, and F-score for each set of
clusters. Plot k in the horizontal axis and the B-CUBED precision, recall and F-score in
the vertical axis in the same plot.
7. (10 marks) Comparing the different clusterings you obtained in (3)-(6), discuss in which
setting you obtained best clustering for this dataset.