Python代写 | MATH2319 Machine Learning

本次澳洲代写主要为Python机器学习相关的Assessment

Assesment Instructions
This assessment contains 2 questions.
Our prescribed textbook information can be found at the link below:
https://rmit.instructure.com/courses/79988/pages/course-resources?module_item_id=3006548
(https://rmit.instructure.com/courses/79988/pages/course-resources?module_item_id=3006548)
You can access a digital version of our prescribed textbook (FIRST Edition) via RMIT Libraries with
your student ID and password below:
https://www.rmit.edu.au/library (https://www.rmit.edu.au/library)
Please follow the additional instructions below:
Solution Mode: For the two questions in this assessment, you can provide your solutions using
either Python or Microsoft Excel. In addition, you can use Excel for some question parts and use
Python for some other question parts if that is what you would like to do – provided that you explain
all your solution steps in your Notebook files. The only exception here is that you will need to use
Python only for Question 1 Part (E) as explained below.
If you use Excel, please read the following carefully:
1. Upload one Excel file per question that is solved by Excel (named as required) showing all your
calculations.
2. You will still need to explain your all solution steps in a clear fashion and present all your
solutions in the required tables in your Jupyter Notebook submission file.
3. Marking of your work will be done based on your Jupyter Notebooks, not your Excel files (your
Excel files will be used only for verification purposes).
4. IMPORTANT NOTE: Due to the reason stated above, you will not get any points for
solutions not presented in your Jupyter Notebook (even if you submit your Excel files!).
Table Format Clarification: Whenever we ask for a table, your table can be a Pandas data frame or
a Jupyter notebook table; both formats will be accepted.

Question 1
(50 points)
This question is inspired from Exercise 3 in Chapter 5 in the textbook.

Some Background Information: In our version of the dataset, the price feature has been
discretized as low , medium , and high , and premium . If you are interested, these levels
correspond to the following price ranges in the actual diamonds dataset:
low price: price between $1000 and $2000
medium price: price between $2000 and $3000
high price: price between $3000 and $3500
premium price: price between $3500 and $4000
Question Overview: For this question, you will use the (unweighted) KNN algorithm for predicting
the carat (numerical) target feature for the following single observation using the Euclidean
distance metric with different number of neighbors:
cut = good
color = D
depth = 60
price = premium
( carat = 0.71 but you will pretend that you do not have this information)
In practice, you would use cross-validation or train-test split for determining optimal values of KNN
hyperparameters. However, as far as this assessment is concerned, you are to use entire data
for training.
Part A (15 points)
Prepare your dataset for KNN modeling. Specifically,
1. Perform one-hot encoding of the categorical descriptive features in the input dataset.
2. Scale your descriptive features to be between 0 and 1.
3. Display the last 10 rows after one-hot encoding and scaling.

NOTE: For Parts (B), (C), and (D) below, you are not allowed to use the KNeighborsRegressor() in
Scikit-Learn module, but rather use manual calculations (via either Python or Excel). That is, you will
need to show and explain all your solution steps without using Scikit-Learn. The reason for this
restriction is so that you get to learn how some things work behind the scenes.