Question 11 pts
Select all that are correct statements in the context of data linkage.
Group of answer choices
The Pair Completeness score is likely to decrease if the sizes of all blocks are large.
For any blocking function, blocking reduces the original complexity of O(n^2) for pairwise comparison to a linear complexity
Assuming each record is allocated to exactly one block and that all blocks are equally sized, a blocking method that produces more blocks will have a higher reduction ratio.
Question 24 pts
Consider the following XML file:
<URL> https://handbook.unimelb.edu.au/subjects/comp20008 </url>
<name> Elements of Data Processing </name>
(a) Modify the XML so that it is well formed.
(b) Explain why the data format is said to be semi-structured.
Question 34 pts
Consider the following temperature data from various weather stations in Victoria:
16, 12, 15, 18, 13, 43, 10
The values are comma separated.
(a) Will the 43 value be classified as an outlier on the Tukey plot? Demonstrate how you arrive at the conclusion.
(b) Suggest an imputation method for the data and justify your choice.
Question 42 pts
Consider the following two plots:
Plot (1) is a VAT plot
Plot (2) is a scatter plot of the first 2 Principal Components of the data.
The data scientist states that the two plots are created from the same dataset. Do you believe the statement? Justify your answer.
Question 53 pts
Consider a dataset with 10000 rows and 500 features. Give three reasons why we might want to apply PCA while analysing the dataset.
Question 68 pts
- a) Explain with examples what supervised and unsupervised learning. is and what the key differences are. 4 points
- b) Assume you need to build a model from medical data that predicts if a patient suffers from a particular illness or not. How would you decide whether to use supervised or unsupervised learning? 4 points
Question 74 pts
Assume you use k-nn clustering on a data set. Describe a method for choosing the best value for k?
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: firstname.lastname@example.org 微信:itcsdx