本次Python代写是用逻辑回归和分类器完成数据挖掘

CS 412: Introduction to Data Mining (Fall 2020)

Assignment 5

Problem 1. Logistic regression (25 points)

[You MUST implement LR from scratch to get points]

In this sub-problem, you are required to implement logistic regression classifier to identify

digit number 0. It means that only digit 0 is positive (Y = 1) while all other digits are

negative (Y = 0). The posterior probability are shown as follows:

P(Y = 1|X = x) = exp(wT x)

1+exp(wT x)

P(Y = 0|X = x) = 1

1+exp(wT x)

where wT denotes the transpose of the w weight vector and wT x is the inner product of w

and x. Please check detailed instructions and requirements below.

Requirement:

• For simplicity, you can flatten each image of size 28 × 28 pixels into a feature vector

x ∈ R

784×1

. For each value in x, please divide it by 255 to make it in the range of

[0, 1]. Note that this step may not be required in following sub-problems.

• Please use the gradient ascent algorithm to maximize log-likelihood. To simplify

the model, we IGNORE the bias term. And please DO NOT add regularization.

• Please initialize w as w = 0, i.e., a vector filled with 0.

• For hyperparameter settings, the number of iterations is 100 and the learning rate is

0.1/N where N is the number of images in training set.

• For the results, please plot (1) the testing accuracy vs. the number of iterations and (2)

log likelihood vs. the number of iterations. Write your observations and explanations

of the result from both figures and submit it along with both the code and the figures.

2

Problem 2. K-Nearest Neighbor Classifier (KNN) (25 points)

[You MUST implement KNN from scratch to get points]

In this sub-problem, you are required to implement K-nearest neighbor classifiers with different distance measures to identify digit numbers (0 to 9) from images. Here are some detailed

instructions and requirements that you should follow.

Requirement:

• Feature Vector Extraction: In order to perform KNN, the first step is to convert images

into feature vectors. For simplicity, you can first flatten each image xi of size 28 × 28

pixels and then normalize it by dividing 255 into a feature vector vi ∈ R

784×1

.

• Distance Measure: In your KNN implementation, you should consider both the Euclidean distance and the Manhattan distance. In particular, given any two images (xi

and xj ) and their corresponding length-L feature vectors (vi and vj ), the distance

between xi and xj can be computed in the following ways.

Euclidean distance: d(xi

, xj ) = qPL

l=1(vi(l) − vj (l))2

.

Manhattan distance: d(xi

, xj ) = PL

l=1 |(vi(l) − vj (l))|.

• Speedup Strategy: To speedup your KNN, you are allowed to use specialized data

structures such as kd-trees.

• For the results, please plot (1) the prediction accuracy w.r.t. K = 1, 3, 5, 7, 9 using

Euclidean distance; (2) the prediction accuracy w.r.t. K = 1, 3, 5, 7, 9 using Manhattan

distance. Do you have any interesting findings by comparing the curves in (1) and (2)?

Problem 3. Multi-layer perceptron (25 points)

In this sub-problem, you are required to implement the basic neural network model which

consists of a hidden layer sandwiched by an input and an output layer. Here are some basic

requirements.

Requirement:

• Given weight matrices W1, W2 with W1 ∈ R

h×d and W2 ∈ R

10×h and bias vectors

b1 ∈ R

h and b2 ∈ R

10, you will learn a function F defined as

F(x) = W2σ(W1x + b1) + b2,

where σ is the activation function (either ReLU or sigmoid). You will use d = 784 =

28 × 28 input units, one for each of the image’s pixels. The recommended hidden

dimension h = 32.

• To train the neural network you need to use cross entropy as your loss function. To be

specific, it is given by

1

n

Xn

i=1

yi

log ˆyi + (1 − yi) log(1 − yˆi),

3

where yi

is the label of the i-th training sample xi

, and ˆyi = σ(F(xi)). You do not

have to implement these functions yourself. Instead, you can use built-in functions

such as torch.nn.CrossEntropyLoss(). Note that torch.nn.CrossEntropyLoss()

incorporates a softmax function. So you do not need to explicitly include an activation

function in the last layer of your network.

• The recommended batch size would be 256 for training, and 1000 for testing. The

optimizer could be SGD or Adam. The suggested learning rate is 0.01 for SGD and

0.001 for Adam.

• For the results, please (1) plot the training and testing loss curve w.r.t. the training

epochs (40 epochs at least); (2) plot the training and testing accuracy w.r.t. the

training epochs (40 epochs at least); You are required to have multiple runs (at least

3 runs) with different random network initialization, and report their mean and std

for both curves; (3) Please report the hyperparameters you use, including the hidden

dimension h, the batch size, the optimizer (SGD/Adam), and the learning rate.

Problem 4. Convolutional neural network (25 points)

In this sub-problem, you are required to implement a CNN model for classification. The

implementation could be versatile, but below are some basic requirements that you should

follow.

Requirement:

• The CNN structure should have at least 2 convolutional layers (followed with ReLU

and MaxPool for each convolutional layer) plus 2 fully connected layers, you are free to

explore more sophisticated network structure (adding Dropout, batch normalization,

etc) to further boost the results. But please notice, more complex model design will

result in longer training time, especially when you only have CPU for the training

instead of GPU.

• The recommended loss function is cross entropy loss. The optimizer could be SGD

or Adam, etc. The recommended batch size for training would be 256, and 1000 for

testing. Your model should be able to converge stably within 40 training epochs. The

suggested learning rate is 0.01 for SGD and 0.001 for Adam.

• For the results, please (1) plot the training and testing loss curve w.r.t. the training

epochs (40 epochs at least); (2) plot the training and testing accuracy w.r.t. the

training epochs (40 epochs at least); You are required to have multiple runs (at least 3

runs) with different random network initialization, and report their mean and std for

both curves; (3) Please report all the hyperparameters used and draw your network

architecture clearly in the report. A sample of a CNN architecture is attached below

in Table 1 where [5 × 5, 10] denotes one convolution layer with 10 filters of size 5 × 5.

**程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！

**E-mail:** [email protected] **微信:**itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。