# Java代写 | F19 17601 Data-driven Software Engineering

## Instruction

This HW includes both theory and implementation problems. Please note,

• Your code must work with Python 3.5+ (you may install the Anaconda distribution of Python)
• You need to submit a report in PDF including all written deliverable and plots via Gradescope, and all implementation codes via Canvas so one could regenerate your results.
• For non-linear classifier problem, you need to submit a Jupyter notebook for each algorithm and all complementary python codes so one could reproduce your results. Submit your code in Problem4*.py and replace * witn the name of classifier.
• For clustering problem, you are NOT allowed to use the implementation from scikit-learn or import it and should submit your own implementation. Submit your code in Problem5.py. The submission for this homework should be a single PDF file with solutions of theory problems, figures,and any text explaining your results of programming problems via Gradescope and a single ‘zip‘ file containing all of the relevant code via Canvas.

## SVM&Kernel Methods

Problem 1 (10 points) In this problem we would like to compare the solutions of hard and soft SVMs on a linearly seperable dataset.Let n > 1 be a fixed number. Is it true that there exists C > 0 such that for every sample S of n training examples with binary label, which are linearly separable, the hard-SVM and the soft-SVM (with parameter C) solutions will return exactly the same weight vector.Justify your answer. (Hint: consider n = 2, d = 1 and S = {(x1, y1),(x2, y2)}. Let a > 0 and consider x1 = a, y1 = 1, x2 = a, y2 = 1. Derive the optimal solution for hard and soft SVM and compare the results.)

Problem 2 (10 points) In this problem you are asked to find the explicit mapping function Φ(·) for the Gaussian kernel k (x, z) = expby showing that it can be expressed as the inner product of an infinite-dimensional feature space by a mapping Φ.

a) Assume x, z R1 (only one feature per training example). Use the Taylor expansion of the kernel function κ (to be precise, the exp(·) function) and derive the explicit mapping Φ the kernel correspond to.

b) Answer the above question in general case where x, z Rd . Assume x2 = z2 = 1.

Problem 3 (20 points) In this problem we aim at utilizing the kerenl trick in Ridge regression and propose its kernalized version. Recall the Ridge regression training objective function:

for λ > 0.

a) Show that for w to be a minimizer of f(w), we must have XXw + λIw = Xy,where X Rn×d is the data matrix with n samples each with d features, and I is iden-tity matrix (please check lectures for more details). Show that the minimizer of f(w) is w =（XX + λI1Xy.Justify that the matrix XX + λI is invertible, for λ > 0.

(Hint: use SVD decomposition of data matrix X = UΣV and show all the eigenvalues of XX + λI are larger than zero).

b) Rewrite XXw + λIw = Xy as w =1/λ（Xy XXw）.Based on this, show that we can write w = Xα for some α Rn , and give an expression for α.

c) Based on the fact that w = Xα, explain why we say w is ”in the span of the data.”

d) Show that α =（λI + XX1y. Note that XXis the n × n Gram (kernel) matrix for the standard vector dot product. (Hint: Replace w by Xα in the expression for α, and then solve for α.)

e) Give a kernelized expression for the Xw, the predicted values on the training points.(Hint: Replace w by Xα and α by its expression in terms of the kernel matrix XX.)

f) Give an expression for the prediction wx for a test sample x, not in the training set,where wis the optimal solution. The expression should only involve x via inner products training data samples xi , i = 1, . . . , n.

g) Based on (f), propose a kernalized version of the Ridge regression.