1. 6 To get full marks in each case provide a brief explanation:
(a) We apply PCA to reduce the number of features of a linearly separable, binary classification
dataset. True or False: The resulting dataset is necessarily linearly separable.
(b) True or False: It is possible for a nonlinear autoencoder to achieve zero loss on a finite training
set (i.e., perfectly reconstruct all x in the training set) even if latent dimension K < D.
(c) Suppose that we are given a training set fx(1); x(2)g of exactly 2 points, and we apply Expectation
Maximization (EM) to fit a learned Gaussian mixture model with K = 2 components. True or
False: the two Gaussian centers of our model will always converge to the two training points, i.e.,
at convergence we will have µ^1 = x(1) and µ^2 = x(2) or (equivalently) µ^2 = x(1) and µ^1 = x(2).
2. 3 Consider two algorithms for binary classification: (A) logistic regression and (B) multi-layer neural
network with non-linear activations. If you had to boost one algorithm and bag the other, which would
you choose? Explain why briefly.
3. 6 Consider a data generating distribution with the pdf:
p(xjθ) = θ exp(−θx);
parameterized by θ > 0. Suppose we observe a dataset D = fx(1); x(2); : : : ; x(N)g with N observations
x(i) > 0 generated by this distribution.
(a) 4 Using a prior p(θ) = exp(−θ), derive the Maximum A-Posteriori Estimator (MAP) of θ given
(b) 2 The conditional distribution p(xjθ) in this question is known as an exponential distribution.
True or False: the posterior p(θjx) is an exponential distribution. Explain why or why not.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: firstname.lastname@example.org 微信:itcsdx