本次澳洲代写主要为Python机器学习相关的Assessment

Assesment Instructions

This assessment contains 2 questions.

Our prescribed textbook information can be found at the link below:

https://rmit.instructure.com/courses/79988/pages/course-resources?module_item_id=3006548

(https://rmit.instructure.com/courses/79988/pages/course-resources?module_item_id=3006548)

You can access a digital version of our prescribed textbook (FIRST Edition) via RMIT Libraries with

your student ID and password below:

https://www.rmit.edu.au/library (https://www.rmit.edu.au/library)

Please follow the additional instructions below:

Solution Mode: For the two questions in this assessment, you can provide your solutions using

either Python or Microsoft Excel. In addition, you can use Excel for some question parts and use

Python for some other question parts if that is what you would like to do – provided that you explain

all your solution steps in your Notebook files. The only exception here is that you will need to use

Python only for Question 1 Part (E) as explained below.

If you use Excel, please read the following carefully:

1. Upload one Excel file per question that is solved by Excel (named as required) showing all your

calculations.

2. You will still need to explain your all solution steps in a clear fashion and present all your

solutions in the required tables in your Jupyter Notebook submission file.

3. Marking of your work will be done based on your Jupyter Notebooks, not your Excel files (your

Excel files will be used only for verification purposes).

4. IMPORTANT NOTE: Due to the reason stated above, you will not get any points for

solutions not presented in your Jupyter Notebook (even if you submit your Excel files!).

Table Format Clarification: Whenever we ask for a table, your table can be a Pandas data frame or

a Jupyter notebook table; both formats will be accepted.

Question 1

(50 points)

This question is inspired from Exercise 3 in Chapter 5 in the textbook.

Some Background Information: In our version of the dataset, the price feature has been

discretized as low , medium , and high , and premium . If you are interested, these levels

correspond to the following price ranges in the actual diamonds dataset:

low price: price between $1000 and $2000

medium price: price between $2000 and $3000

high price: price between $3000 and $3500

premium price: price between $3500 and $4000

Question Overview: For this question, you will use the (unweighted) KNN algorithm for predicting

the carat (numerical) target feature for the following single observation using the Euclidean

distance metric with different number of neighbors:

cut = good

color = D

depth = 60

price = premium

( carat = 0.71 but you will pretend that you do not have this information)

In practice, you would use cross-validation or train-test split for determining optimal values of KNN

hyperparameters. However, as far as this assessment is concerned, you are to use entire data

for training.

Part A (15 points)

Prepare your dataset for KNN modeling. Specifically,

1. Perform one-hot encoding of the categorical descriptive features in the input dataset.

2. Scale your descriptive features to be between 0 and 1.

3. Display the last 10 rows after one-hot encoding and scaling.

NOTE: For Parts (B), (C), and (D) below, you are not allowed to use the KNeighborsRegressor() in

Scikit-Learn module, but rather use manual calculations (via either Python or Excel). That is, you will

need to show and explain all your solution steps without using Scikit-Learn. The reason for this

restriction is so that you get to learn how some things work behind the scenes.

**程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！

**E-mail:** itcsdx@outlook.com **微信:**itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。