2023年11月25日

Heap Allocate Assignment

这是一篇美国的python机器学习python代写

Instruction

This HW includes both theory and implementation problems. Please note,

Your code must work with Python 3.5+ (you may install the Anaconda distribution of Python)
You need to submit a report in PDF including all written deliverable, and all implementation codes so one could regenerate your results.

For programming part of PCA problem, you can import any module you would like from scikit-learn or other external libraries to minimize the amount of implementation required to get the coding problem done. Submit your code in a Jupyter notebook named PCA.ipynb.

The submission for this homework should be a single Jupyter notebook file containing all of the relevant code to reporduce your results via Canvas and a single PDF file containing the solutions and figures, and any text explaining your results via Gradescope (please include all the figures/results of Problem 2 in the PDF file for grading purposes).

PCA

Problem 1. (10 points) As we discussed in the lecture, the PCA algorithm is sensitive to scaling and pre-processing of data. In this problem, we explore few data pre-processing operations on data matrices.

a) Explain what does each of the following data processing operation mean, why it is important, and how you apply to a data matrix X ∈ Rn×d, n samples each with d features 1 (if it helps, use mathematical equations to show the calculations).

(a) Centering or mean removal

(b) Scaling of a feature to a range [a, b]

(d) Normalization (sample level)

b) Apply the above operations separately to the following data matrix with n = 5 samples and d = 2 features and show the processed data matrix. For scaling pick a = 0 and b = 1.

Problem 2. (25 points) In this assignment, you will explore PCA as a technique for discerning whether low-dimensional structure exists in a set of data and for finding good representations of the data in that subspace. To this end, you will do PCA on Iris dataset which can be loaded in scikit-learn using following commands:

from sklearn.datasets import load_iris

iris = load_iris()

X = iris.data

y = iris.target

a) Carry out a principal component analysis on the entire raw dataset (4-dimensional instances) for k = 1, 2, 3, 4 components. How much of variance in data can be explained by the first principal component? How does the fraction of variation explained in data vary as k varies?

b) Apply the standardization operations from Problem 1 on raw features and repeat part

(a) on processed data. Explain any differences you observe compared to part (a) and justify.

c) Project the raw four dimensional data down to a two dimensional subspace generated by first two top principle components (PCs) from part (b) and produce a scatter plot of the data. Make sure you plot each of the three classes differently (using color or different markers). Can you see the three Iris flower clusters?

d) Either use your k-means++ implementation from previous homework or from scikit-learn to cluster data from part (c) into three clusters. Explain your observations.

Matrix Factorization

Problem 3. (15 points) Recall the following objective for extracting latent features from a partially observed rating matrix via matrix factorization (MF) for making recommendations,discussed in the class:

where

n: number of users
m: number of items
R ∈ Rn×m: input partially observed rating matrix
Ω ⊆ [n]×[m]: index of observed entries in rating matrix, where [n] denotes the sequence of numbers {1, 2, . . . , n}.

k: number of latent features
U ∈ Rn×k : the (unknown) matrix of latent feature vectors for n users (the ith row

ui ∈ Rk is the latent features for ith user)

V ∈ Rm×k : the (unknown) matrix of latent feature vectors for m items (the jth row

vj ∈ Rk is the latent features for jth movie)

Please do the followings:

In solving Equation (1) with iterative Alternating Minimization algorithm (fixing V (t) and taking gradient step for U(t) and vice verse), discuss what happens if U (0) and V (0) are initialized to zero?

Discuss why when there is no regularization in basic MF formulated in Equation (1),i.e., α = β = 0, each user must have rated at least k movies, and each movie must have been rated by at least k users. 2

Computing the closed form solution in part (2) could be computational burden for large number of users or movies. A remedy for this would be using iterative optimization algorithms such as Stochastic Gradient Descent (SGD). Assume we run SGD for T iterations, where at each iteration t = 1, 2, . . . , T we sample a rating (it , jt) ∈ Ω uniformly at random and update the latent features for user it and movie jt . Derive the updating rules for u(t)it and v(t)jt at tth iteration of SGD. Show the detailed steps and write the pseudo code clearly.

程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

CS代写,留学生编程代写,CS作业代写,Java代写,程序代写，代码代写 | ITCS代写

本网站支持淘宝支付宝微信支付 paypal等等交易。如果不放心可以用淘宝交易！

E-mail:itcsdx@outlook.com 微信:itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。

Python代写 | Computer Science 320 Assignment 4 (Dynamic Programming)Python代写 | CSE 6242 / CX 4242: Data and Visual Analytics HW 4

CONTACT

Assignment Example

Service Scope

Recent Case

2024年10月8日

ITCS代写

Heap Allocate Assignment

Instruction

PCA

CONTACT

Assignment Example

Service Scope

Recent Case

MySQL数据库学习指南：留学生如何在不同国家的课程和就业形势下脱颖而出

北美计算机留学高校整理与热门专业前景分析

留学生计算机代写常见服务有哪些？

留学生程序代写靠谱吗

留学生如何选择机器学习方向的专业

Tags