# Python代写MATH | CSC311 Fall 2020 Homework 4 – v2

CSC311 Fall 2020 Homework 4 – v2

1. [2pts] Marginalization and Log-Sum-Exp. In this question you will learn how to compute log marginal probabilities in a numerically stable way. Suppose that you have a generative model p(x, i) for labeled data (x, i) where i is a label that can be one of 0, 1, 2, . . . , k.
Recall that the marginal probability p(x) can be computed using the following formula:
p(x) = X
k
i=0
p(x, i) (0.1)
When x is a high-dimensional data point, it is typical for the marginal p(x) and the joint
p(x, i) to be extremely small numbers that cannot be represented in floating point. For this
reason, we usually report and compute with log probabilities.
If we want to compute log(p(x)) only given access to log(p(x, i)), then we can use what is
called a log-sum-exp:
log X
k
i=0
exp(ai)
!
(0.2)
where ai ∈ R are real numbers. In our example, if ai = log(p(x, i)), then expression (0.2)
would correspond to the log marginal probability log(p(x)) of x.
Unfortunately, computing log-sum-exp naively can lead to numerical instabilities. The numerical instabilities in log-sum-exp are caused by problems that arise when trying to compute
exponentials using floating point numbers. Two things can go wrong:
(a) Underflow. If a[i] is very small, then np.exp(a[i]) will evaluate to 0.
(b) Overflow. If a[i] is very large, then np.exp(a[i]) will evaluate to inf.
The cause of underflow and overflow is that floating point numbers cannot represent numbers
arbitrarily close to 0 nor arbitrarily large numbers.
(a) [1pt] We have provided code in q1.py with two implementations of log-sum-exp: a
naive, numerically unstable implementation and a numerically stable one. Modify the
elements of a so that logsumexp unstable returns -inf, and modify the elements of
b so that logsumexp unstable returns inf. Report the two vectors, a and b, in your
write-up.
(b) [1pt] Prove that our numerically stable implementation is correct by proving that
log X
k
i=0
exp(ai)
!
= log X
k
i=0
exp
ai −
k
max
j=0
{aj}
!
+
k
max
j=0
{aj} (0.3)
Briefly explain why the numerically stable version is robust to underflow and overflow.
2. [3pts] Gaussian Discriminant Analysis. For this question you will build classifiers to
label images of handwritten digits. Each image is 8 by 8 pixels and is represented as a vector
of dimension 64 by listing all the pixel values in raster scan order. The images are grayscale
and the pixel values are between 0 and 1. The labels y are 0, 1, 2, . . . , 9 corresponding to
which character was written in the image. There are 700 training cases and 400 test cases for
each digit; they can be found in a4digits.zip.
A skeleton (q2.py) is is provided for each question that you should use to structure your code.
Starter code to help you load the data is provided (data.py). Note: the get digits by label
function in data.py returns the subset of digits that belong to a given class.
Using maximum likelihood, fit a set of 10 class-conditional Gaussians with a separate, full
covariance matrix for each class. Remember that the conditional multivariate Gaussian probability density is given by,
p(x | y = k, µ, Σk) = (2π)
−d/2
|Σk|
−1/2
exp

1
2
(x − µk
)
T Σ
−1
k
(x − µk
)

(0.4)
You should take p(y = k) = 1
10
. You will compute parameters µkj and Σk for k ∈ (0…9), j ∈
(1…64). You should implement the covariance computation yourself (i.e. without the aid of
’np.cov’). Hint: To ensure numerical stability you may have to add a small multiple of the
identity to each covariance matrix. For this assignment you should add 0.01I to each matrix.
(a) [1pt] Using the parameters you fit on the training set and Bayes rule, compute the
average conditional log-likelihood, i.e. 1
N
PN
i=1 log(p(y
(i)
| x
(i)
, θ)) on both the train and
test set and report it. Hint: you will want to use the log-sum-exp we discussed in
(b) [1pt] Select the most likely posterior class for each training and test data point as your
prediction, and report your accuracy on the train and test set.
(c) [1pt] Compute the leading eigenvectors (largest eigenvalue) for each class covariance
matrix (can use np.linalg.eig) and plot them side by side as 8 by 8 images.