---
title: 'Math 50 Fall 2017, Homework #4'
---
__NOTE: For your homework download and use the template__ (https://math.dartmouth.edu/~m50f17/HW4.Rmd)
__Read the green comments in the rmd file to see where your answers should go.__
## Question-1 (Sample)
Read Example 3.1 Delivery Time Data.
(a) Graphics can be very useful in analyzing the data. Plot two useful visualization of the data. First plot three dimensional scatterplot of delivery time data. Then plot scatterplot matrix (which is an array of 2D plots where each plot is a scatter diagram between two variables).
(b) Fit a regression model for the reduced model relating delivery time to number of cases. Plot the joint confidence region of the coefficients (slope and intercept). Also add a point to the plot to show the estimated slope and intercept.
(c) Calculate the extra sum of squares due to the regressor variable Distance.
### Answer:
```{r}
# Computation part of the answer :
# Loading the data
delivery <- read.table("https://math.dartmouth.edu/~m50f17/delivery.csv", header = TRUE)
x1Cases <- delivery$Cases
x2Distance <- delivery$Distance
yTime <- delivery$Time
cat ("Part (a) \n")
# 3D scatter diagram
library("plot3D")
library("scatterplot3d")
sc1 <- scatterplot3d(x1Cases, x2Distance, yTime, pch=17 , type = 'p', angle = 15 , highlight.3d = T )
# Plot scatterplot matrix
plot(delivery[,-1])
cat("Part (b) \n")
library(ellipse)
reducedFit <- lm(Time ~ x1Cases, data = delivery)
plot(ellipse(reducedFit), type = "l", xlab = "Intercept", ylab = "Slope", main = "Joint Confidence Region")
points (reducedFit$coeff[[1]] , reducedFit$coeff[[2]] )
cat("Part (c) \n")
fullFit <- lm(Time ~ Cases + Distance, data = delivery)
reducedSSR <- sum((predict(reducedFit) - mean(yTime))^2)
fullSSR <- sum((predict(fullFit) - mean(yTime))^2)
cat ( "Extra sum of square due to distance is :
" , fullSSR - reducedSSR , "\n")
```
## Question-2
Load the kin viscosity data (explained in Problem 3.14 and table B-10 in the book) at https://math.dartmouth.edu/~m50f17/kin.csv
Solve the parts (a) to (e) of the Problem 3.14 and use $\alpha=0.05$. In addition, do the following.
(f) Calculate the extra sum of squares due to the regressor variable x1.
(g) Plot scatterplot matrix and scatter diagram in order to visualize the data. Can you make any connection between the visualization of data and the results you found in previous parts? Discuss.
### Answer:
```{r}
# Computation part of the answer :
# Loading data
kin <- read.table("https://math.dartmouth.edu/~m50f17/kinematic.csv", header = TRUE, sep=",")
x1 <- kin$x1
x2 <- kin$x2
y <- kin$y
k <- 2
p <- k + 1
n <- length(y)
alpha <- 0.05
fitted <- lm(y ~ x1 + x2 )
beta <- fitted$coefficients
cat("\n\n Part(a)
Fitted multiple linear regression model is
yHat = ", beta[1], "+",beta[2], " * x1",
"+",beta[3], "* x2")
yHat <- beta[1] + beta[2]*x1 + beta[3]*x2
SSr <- sum((yHat-mean(y))^2)
SSres <- sum((yHat-y)^2)
SSt <- sum((y-mean(y))^2)
# F0 for significance of regression
F0 <- (SSr/k)/(SSres/(n-p))
quantile = qf(1-alpha,k,n-p) ;
cat("\n\n Part (b)
F0 is ", F0 , " and it is much greater than the critical quantile value " , quantile , ".
Therefore regression analysis is significant (at least one of the regressors contribute significantly to the model)" )
sum1 <- summary(fitted)
quantIndiv <- qt(1 - alpha/2, n-k-1 )
cat("\n\n Part (c)
Coefficients beta1 and beta2 have t-statistics t0 =", sum1$coefficients[2,3] , " and t0 =" , sum1$coefficients[3,3] , "
respectively, which are both greater than the critical quantile value " , quantIndiv, " in absolute value.
Therefore both contribute significantly to the model." )
fitRed=lm(y~x2)
sum2=summary(fitRed)
cat("\n\n Part (d)
For this model R2 =", sum1$r.squared, ", adjusted R2= " , sum1$adj.r.squared, " which are significantly
greater than the temperature only model's R2 =", sum2$r.squared, "and adjusted R2= " , sum2$adj.r.squared, ".
Therefore we conclude the ratio of the solvents improves the fit.")
q01= qt(1-.01/2,n-k-1)
std1 <- sum1$coefficients[2,2]
std2 <- sum1$coefficients[3,2]
betaRed <- fitRed$coefficients
std2Red <- sum2$coefficients[2,2]
q01Red= qt(1-.01/2,n-k-1 + 1)
cat("\n\n Part (e)
For this model the 99% CI are :
", beta[3] - std2*q01 , " < beta2 < " , beta[3] + std2*q01 , "
For the temperature only model :
", betaRed[2] - std2Red*q01Red , " < beta2 < " , betaRed[2] + std2Red*q01Red , "
which is slightly wider than the previous one." )
reducedSSR <- sum((predict(fitRed) - mean(y))^2)
fullSSR <- sum((predict(fitted) - mean(y))^2)
cat("\n\n Part (f)
Extra sum of square due to distance is :
" , fullSSR - reducedSSR , "\n")
# Plot scatterplot matrix
plot(kin)
cat("\n\n Part (g)
Looking at the individual plots x1 (or x2) doesn't seem
to well explain y alone (these plots looks like they consist of several different curves
which suggests to add another regressor).
This is parallel to the low R2 for the reduced model calculated above.")
```
## Question-3
Load the Mortality data (explained in Problem 3.15 and table B-15 in the book) at
https://math.dartmouth.edu/~m50f17/mortality.csv
Solve the parts (a) to (e) of the Problem 3.15 (use $\alpha=0.05$ if you need). In addition do the following.
(f) You want to quantify the contribution of regressors Educ,NOX,SO2 together to the model. Choose $\alpha=0.01$. Using F test (the partial F test given in equation 3.35) comment on this contribution to the model. (Note the different $\alpha$ value).
(g) Consider the individual contribution test you calculated in part (c). Now choose the two regressor variables with the lowest t-statistic values (in absolute value). Using partial F test comment on their contribution to the model. Use $\alpha=0.01$.
### Answer:
```{r}
# Computation part of the answer :
# Loading data
mor <- read.table("https://math.dartmouth.edu/~m50f17/mortality.csv", header = TRUE)
prec <- mor$PRECIP
educ <- mor$EDUC
nonw <- mor$NONWHITE
nox <- mor$NOX
so2 <- mor$SO2
y <- mor$MORT
k <- 5
p <- k + 1
n <- length(y)
fitted <- lm( y ~ prec + educ + nonw + nox + so2 )
beta <- fitted$coefficients
cat("\n\n Part(a)
Fitted multiple linear regression model is
yHat = ", beta[1], "+",beta[2], " * x1",
"+",beta[3], "* x2",
"+",beta[4], "* x3",
"+",beta[5], "* x4",
"+",beta[6], "* x5" )
yHat <- beta[1] + beta[2]*prec + beta[3]*educ + beta[4]*nonw + beta[5]*nox + beta[6]*so2
SSr <- sum((yHat-mean(y))^2)
SSres <- sum((yHat-y)^2)
SSt <- sum((y-mean(y))^2)
# F0 for significance of regression
F0 <- (SSr/k)/(SSres/(n-p))
quantile = qf(1-alpha,k,n-p) ;
cat("\n\n Part (b)
F0 is ", F0 , " and it is much greater than the critical quantile value " , quantile , ".
Therefore regression analysis is significant (at least one of the regressors contribute significantly to the model)" )
sum1 <- summary(fitted)
quantIndiv <- qt(1 - alpha/2, n-k-1 )
cat("\n\n Part (c)
Looking at the table of the t-statistics below we see that except NOX all regressors contribute significantly to the model. NOX t-statistic is less than (in absolute value) the critical quantile value : ", quantIndiv )
sum1$coefficients
cat("\n\n Part (d)
R2 =", sum1$r.squared, ", adjusted R2= " , sum1$adj.r.squared )
q05 <- qt(1-.05/2,n-k-1)
std5 <- sum1$coefficients[6,2]
cat("\n\n Part (e)
For this model the 95% CI are :
", beta[6] - std5*q05 , " < beta5 < " , beta[6] + std5*q05 )
fitRed <- lm( y ~ prec + nonw)
sum2 =summary(fitRed)
r <- 3
qf01 <- qf(1-0.01 , r , n-p)
reducedSSR <- sum((predict(fitRed) - mean(y))^2)
fullSSR <- sum((predict(fitted) - mean(y))^2)
F0 <- ((fullSSR - reducedSSR)/r)/(SSres/(n-p))
cat("\n\n Part (f)
F0 is ", F0 , " and it is much greater than the critical quantile value " , qf01 , ".
Therefore regression analysis is significant (at least one of the regressors Educ,NOX,SO2 contribute significantly to the model)" )
fitRed <- lm( y ~ educ + nonw + so2)
sum2 =summary(fitRed)
r <- 2
qf01 <- qf(1-0.01 , r , n-p)
reducedSSR <- sum((predict(fitRed) - mean(y))^2)
fullSSR <- sum((predict(fitted) - mean(y))^2)
F0 <- ((fullSSR - reducedSSR)/r)/(SSres/(n-p))
cat("\n\n Part (g)
Lowest two are PRECIP and NOX.
F0 is ", F0 , " and it is smaller than the critical quantile value " , qf01 , ".
F-test suggests that PRECIP and NOX don't contribute significantly to the model
(Note that this is at significance level 0.01, whereas individuals test above were at alpha=0.05)" )
```