# Python代写 | CS 412: Introduction to Data Mining (Fall 2020) Assignment 5

CS 412: Introduction to Data Mining (Fall 2020)

Assignment 5

Problem 1. Logistic regression (25 points)
[You MUST implement LR from scratch to get points]
In this sub-problem, you are required to implement logistic regression classifier to identify
digit number 0. It means that only digit 0 is positive (Y = 1) while all other digits are
negative (Y = 0). The posterior probability are shown as follows:
P(Y = 1|X = x) = exp(wT x)
1+exp(wT x)
P(Y = 0|X = x) = 1
1+exp(wT x)
where wT denotes the transpose of the w weight vector and wT x is the inner product of w
and x. Please check detailed instructions and requirements below.
Requirement:
• For simplicity, you can flatten each image of size 28 × 28 pixels into a feature vector
x ∈ R
784×1
. For each value in x, please divide it by 255 to make it in the range of
[0, 1]. Note that this step may not be required in following sub-problems.
• Please use the gradient ascent algorithm to maximize log-likelihood. To simplify
the model, we IGNORE the bias term. And please DO NOT add regularization.
• Please initialize w as w = 0, i.e., a vector filled with 0.
• For hyperparameter settings, the number of iterations is 100 and the learning rate is
0.1/N where N is the number of images in training set.
• For the results, please plot (1) the testing accuracy vs. the number of iterations and (2)
log likelihood vs. the number of iterations. Write your observations and explanations
of the result from both figures and submit it along with both the code and the figures.
2
Problem 2. K-Nearest Neighbor Classifier (KNN) (25 points)
[You MUST implement KNN from scratch to get points]
In this sub-problem, you are required to implement K-nearest neighbor classifiers with different distance measures to identify digit numbers (0 to 9) from images. Here are some detailed
instructions and requirements that you should follow.
Requirement:
• Feature Vector Extraction: In order to perform KNN, the first step is to convert images
into feature vectors. For simplicity, you can first flatten each image xi of size 28 × 28
pixels and then normalize it by dividing 255 into a feature vector vi ∈ R
784×1
.
• Distance Measure: In your KNN implementation, you should consider both the Euclidean distance and the Manhattan distance. In particular, given any two images (xi
and xj ) and their corresponding length-L feature vectors (vi and vj ), the distance
between xi and xj can be computed in the following ways.
Euclidean distance: d(xi
, xj ) = qPL
l=1(vi(l) − vj (l))2
.
Manhattan distance: d(xi
, xj ) = PL
l=1 |(vi(l) − vj (l))|.
• Speedup Strategy: To speedup your KNN, you are allowed to use specialized data
structures such as kd-trees.
• For the results, please plot (1) the prediction accuracy w.r.t. K = 1, 3, 5, 7, 9 using
Euclidean distance; (2) the prediction accuracy w.r.t. K = 1, 3, 5, 7, 9 using Manhattan
distance. Do you have any interesting findings by comparing the curves in (1) and (2)?
Problem 3. Multi-layer perceptron (25 points)
In this sub-problem, you are required to implement the basic neural network model which
consists of a hidden layer sandwiched by an input and an output layer. Here are some basic
requirements.
Requirement:
• Given weight matrices W1, W2 with W1 ∈ R
h×d and W2 ∈ R
10×h and bias vectors
b1 ∈ R
h and b2 ∈ R
10, you will learn a function F defined as
F(x) = W2σ(W1x + b1) + b2,
where σ is the activation function (either ReLU or sigmoid). You will use d = 784 =
28 × 28 input units, one for each of the image’s pixels. The recommended hidden
dimension h = 32.
• To train the neural network you need to use cross entropy as your loss function. To be
specific, it is given by
1
n
Xn
i=1
yi
log ˆyi + (1 − yi) log(1 − yˆi),
3
where yi
is the label of the i-th training sample xi
, and ˆyi = σ(F(xi)). You do not
have to implement these functions yourself. Instead, you can use built-in functions
such as torch.nn.CrossEntropyLoss(). Note that torch.nn.CrossEntropyLoss()
incorporates a softmax function. So you do not need to explicitly include an activation
function in the last layer of your network.
• The recommended batch size would be 256 for training, and 1000 for testing. The
optimizer could be SGD or Adam. The suggested learning rate is 0.01 for SGD and
0.001 for Adam.
• For the results, please (1) plot the training and testing loss curve w.r.t. the training
epochs (40 epochs at least); (2) plot the training and testing accuracy w.r.t. the
training epochs (40 epochs at least); You are required to have multiple runs (at least
3 runs) with different random network initialization, and report their mean and std
for both curves; (3) Please report the hyperparameters you use, including the hidden
dimension h, the batch size, the optimizer (SGD/Adam), and the learning rate.
Problem 4. Convolutional neural network (25 points)
In this sub-problem, you are required to implement a CNN model for classification. The
implementation could be versatile, but below are some basic requirements that you should
follow.
Requirement:
• The CNN structure should have at least 2 convolutional layers (followed with ReLU
and MaxPool for each convolutional layer) plus 2 fully connected layers, you are free to
explore more sophisticated network structure (adding Dropout, batch normalization,
etc) to further boost the results. But please notice, more complex model design will
result in longer training time, especially when you only have CPU for the training
instead of GPU.
• The recommended loss function is cross entropy loss. The optimizer could be SGD
or Adam, etc. The recommended batch size for training would be 256, and 1000 for
testing. Your model should be able to converge stably within 40 training epochs. The
suggested learning rate is 0.01 for SGD and 0.001 for Adam.
• For the results, please (1) plot the training and testing loss curve w.r.t. the training
epochs (40 epochs at least); (2) plot the training and testing accuracy w.r.t. the
training epochs (40 epochs at least); You are required to have multiple runs (at least 3
runs) with different random network initialization, and report their mean and std for
both curves; (3) Please report all the hyperparameters used and draw your network
architecture clearly in the report. A sample of a CNN architecture is attached below
in Table 1 where [5 × 5, 10] denotes one convolution layer with 10 filters of size 5 × 5.

E-mail: [email protected]  微信:itcsdx