本次Python代写是完成线性回归相关的计算问题

COMS 4771-2 Fall-B 2020 HW 2

Problem 1 (10 points)

In this problem, you will reason about optimal predictions for mean squared error.

Suppose Y1, . . . , Yn, Y are iid random variables—the distribution of Y is unknown to you. You

observe Y1, . . . , Yn as “training data” and must make a (real-valued) prediction of Y .

(a) Assume Y has a probability density function given by

pθ(y) :=

1

θ

2 ye−y/θ if y > 0,

0 if y ≤ 0,

for some θ > 0. Suppose that θ is known to you. What is the “optimal prediction” yˆ

? of

Y that has the smallest mean squared error E[(yˆ

? − Y )

2

]? And what is this smallest mean

squared error? Your answers should be given in terms of θ.

(b) (Continuing from Part (a).) In reality, θ is unknown to you. Suppose you observe (Y1, . . . , Yn) =

(y1, . . . , yn) for some positive real numbers y1, . . . , yn > 0. Derive the following:

• the MLE ˆθ(y1, . . . , yn) of θ given this data;

• the prediction yˆ(y1, . . . , yn) of Y based on the plug-in principle (using ˆθ(y1, . . . , yn)).

Show the steps of your derivation. The MLE and prediction should be given as simple formulas

involving y1, . . . , yn.

(c) Now, instead assume Y ∼ Bernoulli(θ) for some θ ∈ [0, 1]. Suppose that θ is known to you.

What is the prediction yˆ

? of Y that has the smallest mean squared error E[(yˆ

? − Y )

2

]? And

what is this smallest mean squared error? Your answers should be given in terms of θ. (Note:

yˆ

?

is allowed to be any real number!)

(d) (Continuing from Part (c).) Define the following loss function ` : R × R → R by

`(ˆy, y) :=

2(ˆy − y)

2

if yˆ ≥ y,

(ˆy − y)

2

if y < y. ˆ

This loss function is a different way to measure how “bad” a prediction is. With this loss

function, a prediction that is too high is more costly than one that is too low. What is the

prediction yˆ

? of Y that has the smallest expected loss E[`(yˆ

?

, Y )]? And what is this smallest

expected loss? Your answers should be given in terms of θ.

2

Problem 2 (15 points)

In this problem, you’ll practice analyzing a simple data set with linear regression.

Obtain the Jupyter notebook Linear_regression_on_Dartmouth_data.ipynb from Courseworks,

and run the code there (e.g., using Google Colaboratory) to fit linear regression models to the

Dartmouth College GPA data described in lecture.

You’ll now apply a similar linear regression analysis to a data set concerning prostate cancer:

• https://www.cs.columbia.edu/~djhsu/coms4771-f20/data/prostate-train.csv

Regard this data set as “training data” in which the goal is to predict the variable lpsa (the

logarithm of the prostate specific antigen level) using the remaining variables (lcavol, lweight,

age, lbph, svi, lcp, gleason, pgg45) as features.1

(a) For each of the eight features, find the best fit affine function of that variable to the label

lpsa. Report the “slope” and “intercept” in each case.

(b) Now find the best fit affine function of all eight features (together as a vector in R

8

) to the

label lpsa. Report the coefficients in the weight vector and the “intercept” term.

You should find that some of the variables have a negative coefficient in the weight vector from

Part (b) even though its corresponding affine function from Part (a) has a positive slope. This

might seem like a paradox: for such a feature, Part (a) might lead you to think that increasing the

feature’s value should, on average, increase the value of lpsa; whereas Part (b) might lead you to

think that increasing the feature’s value should, on average, decrease the value of lpsa.

Of course there is no paradox. Here is a simple example to show how this can happen. Suppose

X1 ∼ N(0, 1) and X2 ∼ N(0, 1), and E[X1X2] = 2

3

. Furthermore, suppose Y =

3

2X1 −

3

4X2.

(c) What is the linear function of X1 that has smallest mean squared error for predicting Y ?

What is the linear function of X2 that has smallest mean squared error for predicting Y ?

And finally, what is the linear function of (X1, X2) that has smallest mean squared error for

predicting Y ?

You should find that even though each of X1 and X2 is positively correlated with Y (analogous

to the situation in Part (a)), the best linear predictor of Y that considers both X1 and X2 has a

positive coefficient for one variable and a negative coefficient for the other variable (analogous to

the situation in Part (b)).

**程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！

**E-mail:** [email protected] **微信:**itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。