机器学习代写 | MiniProject 2: IMDB Sentiment Analysis
这是一个机器学习作业代写的案例
# because we are generating random data, set a random seed
set.seed(1)
# generate values in x spread evenly from 0 to 20
x <- seq(from=0, to=20, by=0.05)
# generate y according to the following known function of x
y <- 500 + 0.4 * (x-10)ˆ3
# add random noise to y
noise <- rnorm(length(x), mean=10, sd=80)
noisy.y <- y + noise
# plot data
# red line for true underlying function generating y
{
plot(x,noisy.y)
lines(x, y, col=!red!)
}
a. With predictor x and outcome noisy_y, split the data into a training and test set.
b. Perform 10-fold CV for polynomials from degree 1 to 5 (use MSE as your error measure). This should
be done from scratch using a for loop. (Hint: It may be helpful to randomly permute and then split
the training set from the previous section into 10 evenly sized parts. You may need an if statement to
handle a potential problem in the last iteration of your loop.)
c. Plot the best model’s fitted line in blue and compare to the true function (the red line from the previous
plot).
d. Comment on the results of (c). Why was performance better or worse at different order polynomials?
e. Report the CV error and test error at each order of polynomial. Which achieves the lowest CV error?
How does the CV error compare to the test error? Comment on the results.
3. Classifying a toy dataset
a. Pick a new dataset from the mlbench package (one we haven’t used in class that is 2-dimensional with
two classes; Hint: run ls(package:mlbench)). Experiment with classifying the data using KNN at
different values of k. Use cross-validation to choose your best model.
b. Plot misclassification error rate at different values of k.
c. Plot the decision boundary for your classifier using the function at the top code block,
plot_decision_boundary(). Make sure you load this function into memory before trying to
use it.
4. Performance measures for classification
Recall the Caravan data from the week 2 lab (part of the ISLR package). Train a KNN model with k=2 using
all the predictors in the dataset and the outcome Purchase. Create a confusion matrix with the test set
predictions and the actual values of Purchase. Using the values of the confusion matrix, calculate precision,
recall, and F1. (Note that Yes is the positive class and the confusion matrix may be differently oriented
than the one presented in class.)
5. ISLR Chapter 5 Exercise 3