NOTE: For your homework download and use the template (https://math.dartmouth.edu/~m50f17/HW3.Rmd)

Read the green comments in the rmd file to see where your answers should go.




Question-1

Is maximum-likelihood estimator \(\tilde{\sigma}^2\) of \(\sigma^2\) an unbiased estimator ? Verify your answer. Comment on the change of the value of \[\mathbb{E}(\tilde{\sigma}^2) - {\sigma}^2\] as \(n\) goes to infinity.

Answer:

We know that

\[ \tilde{\sigma}^2 = \frac{ \sum_{i=1}^{n}(y_i - \tilde\beta_0 - \tilde\beta_1 x )^2}{n}\] and, moreover \(\tilde\beta_0=\hat\beta_0\) and \(\tilde\beta_1=\hat\beta_1\) since LSE and MLE has same coefficients estimations.

Combining these yield \[ \begin{aligned} \mathbb{E} [\tilde{\sigma}^2] & = \frac{ \mathbb{E} [ \sum_{i=1}^{n}(y_i - \tilde\beta_0 - \tilde\beta_1 x )^2 ] }{n} \\ &= \frac{ \mathbb{E} [ \sum_{i=1}^{n}(y_i - \hat\beta_0 - \hat\beta_1 x )^2 ] }{n} \\ &= \frac{ \mathbb{E} [ SS_{Res} ] }{n} \\ &= \frac{ \mathbb{E} [ (n-2) \hat\sigma^2 ] }{n} \\ \end{aligned} \] since \(\hat\sigma^2 = \frac{SS_{Res}}{n-2}\). We can now see

\[ \mathbb{E} [\tilde{\sigma}^2] = \frac{ (n-2) }{n}\mathbb{E} [ \hat\sigma^2 ] = \frac{ (n-2) }{n} \sigma^2 \]

since \(\hat\sigma^2\) is an unbiased estimator of \(\sigma^2\). We conclude that \(\tilde{\sigma}^2\) is a biased estimator with

\[\mathbb{E}(\tilde{\sigma}^2) - {\sigma}^2 = \left(\frac{ n-2 }{n} -1 \right) \sigma^2 = \frac{ -2 }{n} \sigma^2 \]

which goes to \(0\) as \(n \rightarrow\infty\). Therefore bias is negligible for large \(n\).




Question-2

Consider the shear strength vs age relation using the propellant data.

  1. Recalculate the coefficients of the fitted linear regression model using the vector equations we obtained.

  2. Suppose that the expectation of the initial shear strength is known to be 2400. Write the corresponding model (should involve only one parameter \(\beta_1\)). Calculate 95% CI on \(\beta_1\).

Answer:

# Computation part of the answer : 

prop<-read.table("https://math.dartmouth.edu/~m50f17/propellant.csv", header=T, sep=",")
shearS<-prop$ShearS 
age<-prop$Age

X <- cbind(1,age)
betaHat <- solve( t(X) %*% X ) %*% t(X) %*% shearS

# part (a)
cat ( " Part(a)
 Fitted Coefficients =  " , betaHat   , " \n\n") 
##  Part(a)
##  Fitted Coefficients =   2627.822 -37.15359
# part (b)
# We can use a vertical shift and use the no-intercept model 
# Note that vertical shifting will not change the beta1 and its statistics. 
# Therefore after we shift down by 2400, we simply work on a no-intercept model

shearS_Shifted <- shearS - 2400

beta1Hat = sum( age * shearS_Shifted) / sum(age^2)
yHat <- beta1Hat * age 

# We use CI for no-intercept model
dof = 19 

MSres = sum((yHat - shearS_Shifted)^2) / dof
seBeta1 = sqrt(MSres / sum(age^2) )

tQ = qt( .975, dof)
lowerCI = beta1Hat - tQ * seBeta1
upperCI = beta1Hat + tQ * seBeta1

cat ( " Part (b) 
 Lower and upper bound of CI are :  " ,  lowerCI , ",  " , upperCI , " 
 \n Note that these numbers make sense. Since we forced intercept to be 2400 (which is 
 lower than the estimate of the intercept-model) the regression analysis response is 
 to raise the other end, and therefore increase the slope. "
 ) 
##  Part (b) 
##  Lower and upper bound of CI are :   -28.64285 ,   -19.63201  
##  
##  Note that these numbers make sense. Since we forced intercept to be 2400 (which is 
##  lower than the estimate of the intercept-model) the regression analysis response is 
##  to raise the other end, and therefore increase the slope.
fitted = lm(shearS ~ age)
plot(age, shearS, main="Shear Strength vs. Age", pch=16, xlim=c(0,30), ylim=c(1600,2700))
abline(fitted$coef, lwd = 2, col = "blue")
abline( c(2400,beta1Hat), lwd = 2 , col = 'red')




Example : Phytoplankton Population

A scientist is trying to model the relation between phytoplankton population in the city public water supply and concentration of two substances. The sample data is at : https://math.dartmouth.edu/~m50f17/phytoplankton.csv

where headers are

Lets consider a guessed model
\[ y = 200 + 10x_1 -15x_2 \] Below is the corresponding code to plot the scatter diagram and the above plane.

# Note: Run the following in R console if you get errors in plotting or library loading : 
#  install.packages("scatterplot3d") 
#  install.packages("plot3D") 

library("plot3D")
library("scatterplot3d")

# Loading data 
pData <- read.table("https://math.dartmouth.edu/~m50f17/phytoplankton.csv", header=T, sep=",")
pop <- pData$pop
subs1 <- pData$subs1
subs2 <- pData$subs2

# Create a mesh
meshP <- mesh( seq(min(subs1),max(subs1),0.03)  , seq(min(subs2),max(subs2),0.03) )
x1Mesh <- meshP$x 
x2Mesh <- meshP$y 

myModel <- 200 + 10*x1Mesh  - 15 *x2Mesh   

# Below is the code to plot the scatter diagram with red markers and your model
# You need to set two variables before calling : 
# myModel : your model  
# PLOTTING # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
sc1 <- scatterplot3d(subs2,subs1,pop, pch=17 , type = 'p', angle = 15 , highlight.3d = T ) 
sc1$points3d (x2Mesh,x1Mesh, myModel, cex=.02, col="blue")