# Python代写 | ESE 402/542 Homework 5

ESE 402/542
Homework 5

Problem 1.
(a) In this problem we will analyze logistic regression learned in class.
Sigmoid function can be written as S(x) = 1
1+e−x
For a given variable X assume P(Y = +1|X) is modeled as P(Y = 1|X) = S(β0+β1X).
Plot a 3d figure showing the relation between output and variable β0 and β1 when X
= 1. Take values between [-2,2] for both β0 and β1 with a step size of 0.1 to plot the
3d plot.
(b) In class, we have done binary classification with labels Y={0, 1}. In this problem, we
will be using the labels as Y={−1, 1} as it will be easier to derive the likelihood of the
P(Y |X).
• Show that if Y takes values {−1, 1}, the probability of Y given X can be written
as,(Not programming)
P(Y |X) = 1
1+e−y(β0+β1x)
• We have learned that the coefficients β0 and β1 can be found using MLE estimates. Show that the Log Likelihood function for m data points can be written
as(Not Programming)
ln L (β0, β1) = −
Xm
i=1
ln
1 + e
−yi(β0+β1xi)

• Plot a 3d figure showing the relation between log likelihood function and variable
β0 , β1 when X = 1, Y = -1 and X = 1, Y = 1. Take values between [-2, 2] for
both β0 and β1 with a step size of 0.1 to plot the 3d plot
• Based on the graph, is it possible to maximize this function?
Problem 2.
1. While we can formalize the Likelihood estimate there is no close form expression for the
coefficients β0, β1 maximising the above log likelihood. Hence, we will use an iterative
algorithm to solve for the coefficients.
We can see that
max(−
Xm
i=1
ln
1 + e
−yi(β0+β1xi)

) = min(Xm
i=1
ln
1 + e
−yi(β0+β1xi)

)
We will describe our function loss as L =
1
m
Pm
i=1 ln
1 + e
−yi(β0+β1xi)

. Our objective
is to iterative decrease this loss as we keep computing the optimal coefficients. Here
xi ∈ R
In this problem we will be working with real image data where the goal is to classify if the image is 0 or 1 using logistic regression.
The input X ∈ Rm∗d where a single data point xi ∈ Rd
, d = 784. The matrix labels
Y ∈ Rm, where each label yi ∈ {0, 1}
• Load the data into the memory and visualize one input as an image for each of
label 0 and label 1. (The data should be reshaped back to [28 x 28] to be able
to visualize it.)
• The data is inbetween 0 to 255. Normalise the data to [0,1]
• Set yi = 1 for images labeled 0 and yi = -1 for images labeled 1. Split the data
randomly into train and test with a ratio of 80:20.
Why is random splitting better than sequential splitting in our case?
• Initialize the coefficients using a univariate “normal” (Gaussian) distribution of
mean 0 and variance 1. (Remember that coefficients are a vector of = [β0, β1…βd],
where d is the dimension of the input)
• Compute the loss using the above mentioned Loss L.
(The loss can be written as L =
1
m
Pm
i=1 ln 
1 + e
−yi(β0+
Pd−1
j=0 β(j+1)·xi,j )

, where (i,
j) represent the data point i for i ∈ {1 ..m} and jth dimension of the data point
xi
for j ∈ {0 …d-1})
• To minimize the loss function a widely known algorithm is going in the direction
opposite to the gradients of the loss function.
(It’s helpful to write the coefficients [β1…βd] as a vector β and β0 as a scalar.
Now β is of size [d] ∈R
d and β0 is of size  ∈
Problem 3.
Recall that in classification we assume that each data point is an i.i.d. sample from a(n
unknown) distribution P(X = x, Y = y). In this question, we are going to design the data
distribution P and evaluate the performance of logistic regression on data generated using
P. Keep in mind that we would like to make P as simple as we could. In the following, we
assume x ∈ R and y ∈ {0, 1}, i.e. the data is one-dimensional and the label is binary. Write
P(X = x, Y = y) = P(X = x)P(Y = y|X = x). We will generate X = x according to
the uniform distribution on the interval [0,1] (thus P(X = x) is just the pdf of the uniform
distribution). E-mail: [email protected]  微信:itcsdx 