2023年11月25日

Python数据挖掘代写 | CS 5710: Data Mining Program #5

本次美国CS代写主要是使用Python进行贝叶斯分类器等数据挖掘

Program #5

CS 5710: Data Mining

Updated: April 23, 2021

1 Bayes Classifiers

Imagine that you have collected a data set, X and y, and you believe that every category in y can be reasonably modeled by a multivariate distribution for its corresponding samples in X. If the covariance matrices are spherical, this results in the nearest centroid classifier. If the covariance matrices are the same for each class, this problem is solved by Linear Discriminant Analysis. If the covariance matrices can be different between the classes, this results in the Quadratic Discriminant Analysis. Each of these classifiers performs the following training routine:

1. For each class k:
(a) Compute the mean, μk

(b) Compute the covariance matrix, Σk Then use Bayes rule to classify a new sample:

P(c = k|x(i)) = P(x(i)|c = k)P(c = k) P(x(i))

where:

P(x(i)|c = k) = N(x;μk,Σk)
P (c = k) = samples in class k

n
P(x(i)) = ?P(x(i)|c = k)P(c = k)

2 Gaussian Mixture Models

(The PDF of a multivariate normal distribution) (proportion of samples in each class) (the marginal distribution for x(i))

BUT, what if we lost y and didn’t know the labels for the samples. This is the problem that Gaussian Mixture Models attempt to solve. It’s unsupervised because we don’t know y but we

expect that there could be a y that explains the data well. This means, a GMM must determine the μk and Σk for each class, as well as the class priors. More formally we have n samples, m features, and p classes:

1. The data matrix X contains n samples (feature vectors): X ∈ Rn×m

 (1) (1) (1) x1 x2 …xm

x(2) x(2) . . . x(2)  X=12m

. . .. .  . . . . 

(n) (n) (n) x1 x2 …xm

—x(1)⊤— —x(2)⊤—

= .

μ1,1

μ2,1 μ= .

μ1,2 … μ1,m μ2,2 … μ2,m

 .  —x(n)⊤—

The weights vector (weights_) contains the expected proportion of samples belonging to each of the p classes:
φ = [φ1,φ2,…,φp] φk =P(c=k)
The means matrix (means_) contains the mean vector for each of the p classes: μ ∈ Rp×m

. .. . ….

μp,1 μp,2 —μ⊤1 — —μ⊤2 —

=  .  .

. . . μp,m

— μ ⊤p —
4. Covariances tensor (covariances_) contains the covariance matrix for each of the p classes:

Σ ∈ Rp×m×m
Σ = [Σ1,Σ2,…,Σp]

 σk,1,1 σk,1,2 . . . σk,1,m 

 σk,2,1 σk,2,2 . . . σk,2,m  Σk=. . .. . ….

σk,m,1 σk,m,2 . . . σk,m,m
Using these we can use Bayes rule to compute the posterior P(c = k|x(i)) from the following:

P(x(i)|c = k) = N(x;μk,Σk) P (c = k) = φk

P(x(i)) = ?P(x(i)|c = k)P(c = k) k

Likelihood (The PDF of a multivariate normal distribution) Prior (The component weight for k)

Evidence(the marginal distribution for x(i))

To do this, we start in a way similar to K-Means by initializing μ, Σ, and φ, for example:

μ = [xi1,…,xiK ], where ik are randomly selected from X Σ = [Im,…,Im]

?1 1? φ= K,…,K

The algorithm proceeds as follows:

1. Repeat until stopping condition:
(a) Using X and the weights φ, compute and store the following:

(p_xi_given_ck() ∈ Rn×p) (p_xi_and_ck() ∈ Rn×p) (p_xi() ∈ Rn)

(p_ck_given_xi() ∈ Rn×p) (b) For convergence purposes, compute and store the log probability of the data:

logP(X) = ?logP(x(i)) (log_p_x() ∈ R) i

P(x(i)|c = k) = N(x(i);μk,Σk) P(x(i),c = k) = P(x(i)|c = k)P(c = k)

P(x(i)) = ?P(x(i),c = k) k

P(c = k|x(i)) = P(x(i),c = k) P(x(i))

φ = [φ1,φ2,…,φp]
(d) Compute the responsibility weighted mean vectors (means_):

(phi() ∈ Rp)

? P(c = k|x(i))x(i) im

μk = ?iP(c=k|x(i)) (mu_k()∈R ) μ = ?μ1,μ2,…,μp? (mu() ∈ Rp×m)

(e) Compute the responsibility weighted covariance matrices (covariances_):

? P(c = k|x(i))(x(i) − μ )(x(i) − μ )⊤
ikk m×m

Σk = ?i P(c = k|x(i)) (sigma_k() ∈ R ) Σ = [Σ1,Σ2,…,Σp] (sigma() ∈ Rp×m×m)

(f) Stop if:
i. We’ve updated the parameters max_iter times

ii. The absolute difference in log probability from the last update to this one is less than than n×tol

程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

CS代写,留学生编程代写,CS作业代写,Java代写,程序代写，代码代写 | ITCS代写

本网站支持淘宝支付宝微信支付 paypal等等交易。如果不放心可以用淘宝交易！

E-mail:itcsdx@outlook.com 微信:itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。

人工智能代写 | COMP3308/3608 Artificial Intelligence Tutorial exercises Java代写 | CMPT 456 Course Project 1

CONTACT

Assignment Example

Service Scope

Recent Case

2024年4月25日

ITCS代写

Python数据挖掘代写 | CS 5710: Data Mining Program #5

CONTACT

Assignment Example

Service Scope

Recent Case

IT代写｜ FIT4005 / FIT5125 Research Methods for IT Assignment 2

IT代写｜ FIT5145: Foundations of Data Science Assignments 1 & 3: Business and Data Case Study

数据库代写 | CSE2/4DBF-Assignment

WEB网站代写： 100% MOSS包过原创，CS大神7/24小时服务

编程代写 | PLT-4115 Programming Language And Translator

Tags

Python数据挖掘代写 | CS 5710: Data Mining Program #5

CONTACT

Assignment Example

Service Scope

Recent Case

IT代写 ｜ FIT4005 / FIT5125 Research Methods for IT Assignment 2

IT代写 ｜ FIT5145: Foundations of Data Science Assignments 1 & 3: Business and Data Case Study

数据库代写 | CSE2/4DBF-Assignment

WEB网站代写： 100% MOSS包过原创，CS大神7/24小时服务

编程代写 | PLT-4115 Programming Language And Translator

Tags

IT代写｜ FIT4005 / FIT5125 Research Methods for IT Assignment 2

IT代写｜ FIT5145: Foundations of Data Science Assignments 1 & 3: Business and Data Case Study