Python代写 | MATH2319 Machine Learning



Assesment Instructions
This assessment contains 2 questions.
Our prescribed textbook information can be found at the link below:
You can access a digital version of our prescribed textbook (FIRST Edition) via RMIT Libraries with your student ID and password below: (
Please follow the additional instructions below:
Solution Mode: For the two questions in this assessment, you can provide your solutions using either Python or Microsoft Excel. In addition, you can use Excel for some question parts and use Python for some other question parts if that is what you would like to do – provided that you explain all your solution steps in your Notebook files. The only exception here is that you will need to use Python only for Question 1 Part (E) as explained below.
If you use Excel, please read the following carefully:
1. Upload one Excel file per question that is solved by Excel (named as required) showing all your calculations.
2. You will still need to explain your all solution steps in a clear fashion and present all your solutions in the required tables in your Jupyter Notebook submission file.
3. Marking of your work will be done based on your Jupyter Notebooks, not your Excel files (your Excel files will be used only for verification purposes).
4. IMPORTANT NOTE: Due to the reason stated above, you will not get any points for solutions not presented in your Jupyter Notebook (even if you submit your Excel files!).
Table Format Clarification: Whenever we ask for a table, your table can be a Pandas data frame or a Jupyter notebook table; both formats will be accepted.

Question 1
(50 points)
This question is inspired from Exercise 3 in Chapter 5 in the textbook.

Some Background Information: In our version of the dataset, the price feature has been discretized as low , medium , and high , and premium . If you are interested, these levels correspond to the following price ranges in the actual diamonds dataset:
low price: price between $1000 and $2000
medium price: price between $2000 and $3000
high price: price between $3000 and $3500
premium price: price between $3500 and $4000
Question Overview: For this question, you will use the (unweighted) KNN algorithm for predicting the carat (numerical) target feature for the following single observation using the Euclidean distance metric with different number of neighbors:
cut = good
color = D
depth = 60
price = premium
( carat = 0.71 but you will pretend that you do not have this information)
In practice, you would use cross-validation or train-test split for determining optimal values of KNN hyperparameters. However, as far as this assessment is concerned, you are to use entire data for training.
Part A (15 points)
Prepare your dataset for KNN modeling. Specifically,
1. Perform one-hot encoding of the categorical descriptive features in the input dataset.
2. Scale your descriptive features to be between 0 and 1.
3. Display the last 10 rows after one-hot encoding and scaling.

NOTE: For Parts (B), (C), and (D) below, you are not allowed to use the KNeighborsRegressor() in Scikit-Learn module, but rather use manual calculations (via either Python or Excel). That is, you will need to show and explain all your solution steps without using Scikit-Learn. The reason for this restriction is so that you get to learn how some things work behind the scenes.