# R program,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,… 1 answer below »

• September 24, 2021 /

STATS 100C HW7 due Thursday in class
Problem 1. Consider the true model Y =
Pp
j=1 true;jXj + , where X = (X1; :::;Xp) is a xed
n p matrix, and N(0; 2In), i.e., i N(0; 2) independently for i = 1; :::; n. For simplicity,
let us assume that Xj = ~uj , for j = 1; :::; p, and (~u1; :::; ~up; ~up+1; :::; ~un) form an orthonormal basis.
We can expand =
Pn
j=1 j~uj , so that j N(0; 2) independently.
(1) Show that ^ j = hY;Xji = true;j + j , for j = 1; :::; p.
(2) Show that ~e = Y ?? X ^ =
Pn
j=p+1 j~uj .
(3) Let s2 = j~ej2=(n ?? p). Show that E(s2) = 2.
(4) Express ( ^ j ?? true;j)=s in terms of ‘s.
(5) Suppose we want to test H0 : true;j = 0 versus H1 : true;j 6= 0. Let T = ^ j=s, then under
H0, what is the distribution of T? If H1 is correct, and true;j = 6= 0, then what does T look
like?
Problem 2. Continue from Problem 1. Suppose X = (X1; :::;Xd;Xd+1; :::;Xp), where d
Suppose we want to test H0 : true;j = 0 for j = d + 1; :::; p, versus H1 : not all true;j = 0 for
j = d + 1; :::; p. Let ~e0 be the residual vector after tting the H0 model Y =
Pd
j=1 jXj + . Let
~e1 be the residual vector after tting the H1 model Y =
Pp
j=1 jXj + .
(1) Show that if H0 is true, then ~e0 =
Pn
j=d+1 j~uj , and ~e1 =
Pn
j=p+1 j~uj . Then express the
F-statistic
F =
(j~e0j2 ?? j~e1j2)=(p ?? d)
j~e1j2=(n ?? p)
in terms of ‘s.
(2) If H1 is true, then what does F look like?
Problem 3. Continue from Problem 1.
(1) The training data is Y = X true + . De ne the training error as j~ej2. What is E(j~ej2)?
(2) Suppose we generate testing data ~ Y = X true + ~, where ~ has the same distribution as
, but ~ is independent of , so that ~ =
Pn
j=1
~j~uj , where ~j N(0; 2) independently and ~’s
are independent of ‘s. Suppose we predict ~ Y by X ^ , where ^ is obtained from the training data
(X; Y ) by least squares. De ne the testing error as ~e = ~ Y ?? X ^ . What is E(j~ej2)?
(3) Suppose we t the model Y =
Pd
j=1 jXj + to the training data (X; Y ). Analyze how the
training and testing errors change as d increases from 1 to p, and continue to increase beyond p by
adding spurious predictor vectors Xp+1;Xp+2; :::, whose true coecients are actually zero.
Problem 4. Write R-code to reproduce the outputs of the lm function in R. You can use the
Boston housing data as an example. The inputs to your R code are the matrix X and the vector
Y . You can output ^ , RSS = j~ej2, s2, T-statistic for each j , and the corresponding p-value.
1

Don't use plagiarized sources. Get Your Custom Essay on
R program,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,… 1 answer below »
Just from \$13/Page

Attachments: