CMPT 459 Assignment 4
Question 1 (20 points)
Explore the data. The data set has 23 attributes plus one class label attribute.
1. (10 points) Please list the range of each attribute and compute the mean and variance,
on the training data set and the test data set, respectively. Report the results. By
comparing the results on the two subsets you will have some basic ideas about whether
the training data set and the test data set are similar to each other. In this data set, are
the classes balanced?
2. (10 points) For each attribute in the training data set, from A to W in the excel sheet,
compute the Fisher score. Sort all the attributes in the Fisher score descending order and
list their Fisher scores.
Question 2 (30 points)
Build a decision tree using the training data set. In general, you can use any decision tree.
However, we recommend that you use Scikit <https://scikit-learn.org/stable/#>. Conduct a 10-
fold validation and report the precision, recall and accuracy of each validation and the overall
average precision and recall.
Question 3 (20 points)
Use AdaBoost to boost the decision tree model. Keep n_estimators to default (50) if you are
using Scikit. You can start with the default learning rate and try to tune it to obtain better
performance. Report the changes of performance in precision, recall and accuracy after using
Question 4 (30 points)
To answer this question, you can use any classification methods you would like to use. You are
asked to build a classifier as accurate as you can. Your program should take the test data set and
produce a text file of 5000 lines. Each line of the text file should contain only one number, either
0 or 1, which is the classification of your program on the corresponding row in the test data set.
Please use strictly the order of as the rows listed in the test data set, since your submission in this
question will be marked automatically using a program.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx