Python数据挖掘代写 | CS 5710: Data Mining Program #5

本次美国CS代写主要是使用Python进行贝叶斯分类器等数据挖掘

Program #5

CS 5710: Data Mining

Updated: April 23, 2021

1 Bayes Classifiers

Imagine that you have collected a data set, X and y, and you believe that every category in y can be reasonably modeled by a multivariate distribution for its corresponding samples in X. If the covariance matrices are spherical, this results in the nearest centroid classifier. If the covariance matrices are the same for each class, this problem is solved by Linear Discriminant Analysis. If the covariance matrices can be different between the classes, this results in the Quadratic Discriminant Analysis. Each of these classifiers performs the following training routine:

1. For each class k:
(a) Compute the mean, μk

(b) Compute the covariance matrix, Σk Then use Bayes rule to classify a new sample:

P(c = k|x(i)) = P(x(i)|c = k)P(c = k) P(x(i))

where:

P(x(i)|c = k) = N(x;μk,Σk)
P (c = k) = samples in class k

n
P(x(i)) = ?P(x(i)|c = k)P(c = k)

k

2 Gaussian Mixture Models

(The PDF of a multivariate normal distribution) (proportion of samples in each class) (the marginal distribution for x(i))

BUT, what if we lost y and didn’t know the labels for the samples. This is the problem that Gaussian Mixture Models attempt to solve. It’s unsupervised because we don’t know y but we

expect that there could be a y that explains the data well. This means, a GMM must determine the μk and Σk for each class, as well as the class priors. More formally we have n samples, m features, and p classes:

1. The data matrix X contains n samples (feature vectors): X ∈ Rn×m

 (1) (1) (1) x1 x2 …xm

x(2) x(2) . . . x(2)  X=12m

. . .. .  . . . . 

(n) (n) (n) x1 x2 …xm

—x(1)⊤— —x(2)⊤—

= .

μ1,1

μ2,1 μ= .

μ1,2 … μ1,m μ2,2 … μ2,m

 .  —x(n)⊤—

  1. The weights vector (weights_) contains the expected proportion of samples belonging to each of the p classes:

    φ = [φ1,φ2,…,φp] φk =P(c=k)

  2. The means matrix (means_) contains the mean vector for each of the p classes: μ ∈ Rp×m

. .. . ….

μp,1 μp,2 —μ⊤1 — —μ⊤2 —

=  .  .

. . . μp,m

— μ ⊤p —
4. Covariances tensor (covariances_) contains the covariance matrix for each of the p classes:

Σ ∈ Rp×m×m
Σ = [Σ1,Σ2,…,Σp]

 σk,1,1 σk,1,2 . . . σk,1,m 

 σk,2,1 σk,2,2 . . . σk,2,m  Σk=. . .. . ….

σk,m,1 σk,m,2 . . . σk,m,m
Using these we can use Bayes rule to compute the posterior P(c = k|x(i)) from the following:

P(x(i)|c = k) = N(x;μk,Σk) P (c = k) = φk

P(x(i)) = ?P(x(i)|c = k)P(c = k) k

Likelihood (The PDF of a multivariate normal distribution) Prior (The component weight for k)

Evidence(the marginal distribution for x(i))

To do this, we start in a way similar to K-Means by initializing μ, Σ, and φ, for example:

μ = [xi1,…,xiK ], where ik are randomly selected from X Σ = [Im,…,Im]

?1 1? φ= K,…,K

The algorithm proceeds as follows:

1. Repeat until stopping condition:
(a) Using X and the weights φ, compute and store the following:

(p_xi_given_ck() ∈ Rn×p) (p_xi_and_ck() ∈ Rn×p) (p_xi() ∈ Rn)

(p_ck_given_xi() ∈ Rn×p) (b) For convergence purposes, compute and store the log probability of the data:

logP(X) = ?logP(x(i)) (log_p_x() ∈ R) i

P(x(i)|c = k) = N(x(i);μk,Σk) P(x(i),c = k) = P(x(i)|c = k)P(c = k)

P(x(i)) = ?P(x(i),c = k) k

P(c = k|x(i)) = P(x(i),c = k) P(x(i))

(c) Using the sample responsibility, P(c = k|x(i)), compute the new weights_: φk = P(c = k) = n1 ?P(c = k|x(i))

i

φ = [φ1,φ2,…,φp]
(d) Compute the responsibility weighted mean vectors (means_):

(phi() ∈ Rp)

? P(c = k|x(i))x(i) im

μk = ?iP(c=k|x(i)) (mu_k()∈R ) μ = ?μ1,μ2,…,μp? (mu() ∈ Rp×m)

(e) Compute the responsibility weighted covariance matrices (covariances_):

? P(c = k|x(i))(x(i) − μ )(x(i) − μ )⊤
ikk m×m

Σk = ?i P(c = k|x(i)) (sigma_k() ∈ R ) Σ = [Σ1,Σ2,…,Σp] (sigma() ∈ Rp×m×m)

(f) Stop if:
i. We’ve updated the parameters max_iter times

ii. The absolute difference in log probability from the last update to this one is less than than n×tol