COMP90049: Introduction to Machine Learning
Project 1: Naive Bayes and K-Nearest Neighbour for Predicting Stroke
In this project, you will implement Naive Bayes and K-Nearest Neighbour (K-NN) classifiers. You
will explore inner workings and evaluate behavior on a data set of stroke prediction, and on this basis
respond to some conceptual questions.
You should respond to questions 1-3. In question 2 (b) you can choose between two options for
smoothing and two options for Naive Bayes formulation. A response to a question should take about
100–200 words, and make reference to the data wherever possible.
a Explore the data and summarise different aspects of the data. Can you see any interesting
characteristic in features, classes or categories? What is the main issue with the data?
Considering the issue, how would the Naive Bayes classifier work on this data? Discuss
your answer based on the Naive Bayes’ formulation . (3 marks)
b Is accuracy an appropriate metric to evaluate the models created for this data? Justify
your answer. Explain which metric(s) would be more appropriate, and contrast their utility
against accuracy. [no programming required] (2 marks)
a Explain the independence assumption underlying Naive Bayes. What are the advantages
and disadvantages of this assumption? Elaborate your answers using the features of the
provided data. [no programming required] (1 mark)
b Implement the Naive Bayes classifier. You need to decide how you are going to apply
Naive Bayes for nominal and numeric attributes. You can combine both Gaussian and
Categorical Naive Bayes (option 1) or just using Categorical Naive Bayes (option 2). Explain your decision.
For Categorical Naive Bayes, you can choose either epsilon or Laplace smoothing
for this calculation. Evaluate the classifier using accuracy and appropriate metric(s) on test
data. Explain your observations on how the classifiers have performed based on the metric(s). Discuss the performance of the classifiers in comparison with the Zero-R baseline.
c Explain the difference between epsilon and Laplace smoothing. [no programming
required] (1 mark)
a Implement the K-NN classifier, and find the optimal value for K. (1 mark)
b Based on the obtained value for K in question 4 (a), evaluate the classifier using accuracy
and chosen metric(s) on test data. Explain your observations on how the classifiers have
performed based on the metric(s). Discuss the performance of the classifiers in comparison with the Zero- R baseline. (2 marks)
c Compare the classifiers (Naive Bayes and K-NN) based on metrics’ results. Provide a
comparatory discussion on the results. [no programming required] (1 mark)
Submission will be made via the LMS, as a single Jupyter Notebook file. Submissions will open one
week before the submission deadline.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx