# 机器学习代写｜CSC311 27 Midterm 2 A

1. 6 To get full marks in each case provide a brief explanation:

(a) We apply PCA to reduce the number of features of a linearly separable, binary classification
dataset. True or False: The resulting dataset is necessarily linearly separable.

(b) True or False: It is possible for a nonlinear autoencoder to achieve zero loss on a finite training
set (i.e., perfectly reconstruct all x in the training set) even if latent dimension K < D.

(c) Suppose that we are given a training set fx(1); x(2)g of exactly 2 points, and we apply Expectation
Maximization (EM) to fit a learned Gaussian mixture model with K = 2 components. True or
False: the two Gaussian centers of our model will always converge to the two training points, i.e.,
at convergence we will have µ^1 = x(1) and µ^2 = x(2) or (equivalently) µ^2 = x(1) and µ^1 = x(2).

2. 3 Consider two algorithms for binary classification: (A) logistic regression and (B) multi-layer neural
network with non-linear activations. If you had to boost one algorithm and bag the other, which would
you choose? Explain why briefly.

3. 6 Consider a data generating distribution with the pdf:

p(xjθ) = θ exp(−θx);

parameterized by θ > 0. Suppose we observe a dataset D = fx(1); x(2); : : : ; x(N)g with N observations

x(i) > 0 generated by this distribution.

(a) 4 Using a prior p(θ) = exp(−θ), derive the Maximum A-Posteriori Estimator (MAP) of θ given
D.

(b) 2 The conditional distribution p(xjθ) in this question is known as an exponential distribution.
True or False: the posterior p(θjx) is an exponential distribution. Explain why or why not. E-mail: itcsdx@outlook.com  微信:itcsdx 