NOTE: For your homework download and use the template (https://math.dartmouth.edu/~m50f17/HW5.Rmd)

Read the green comments in the rmd file to see where your answers should go.




Normal probability plots

You can use qqnorm and qqline functions to plot probability plots of residuals. The functions rstandard and rstudent calculate the standardized residuals and R-student residuals, respectively.

prop = read.table("https://math.dartmouth.edu/~m50f17/propellant.csv", header=T, sep=",")
age <- prop$Age
shearS <- prop$ShearS

fitted = lm(shearS ~ age)

stdRes = rstandard(fitted)
rStuRes = rstudent(fitted)

qqnorm(rStuRes, main="Normal Probability Plot (residuals on vertical axis)") 
qqline(rStuRes)

However, note that the book uses residuals on the x-axis instead of y-axis. In order to obtain that use the parameter datax as shown below. In the below graph x-axis denotes the R-student residuals and the y-axis is the theoretical quantiles ( in the book y-axis is probability instead of quantiles).

qqnorm(rStuRes, datax = TRUE ,
       main="Normal Probability Plot") 
qqline(rStuRes, datax = TRUE )

Residual vs predicted response plot

yHat <- predict(fitted)

plot (yHat, rStuRes)
abline(0,0)

After observing that the observation points 5 and 6 look like potential outliers, next we delete those points and compare the fitted model of the deleted data with the full data.

plot(age, shearS, xlim=c(0,30), ylim=c(1600,2700))
abline(fitted$coef, lwd = 2, col = "blue")

ageRem <- age[-6] 
ageRem <- ageRem[-5] 

shearSRem <- shearS[-6]
shearSRem <- shearSRem[-5]

fitted2 = lm(shearSRem ~ ageRem)
abline(fitted2$coef, lwd = 2, col = "red")

A note

There is a dedicated library :

MPV: Data Sets from Montgomery, Peck and Vining’s Book

in order to provide an easy way to load tables from the book. To install the library type :

install.packages(“MPV”)

Below is an example how to use this library. Check https://cran.r-project.org/web/packages/MPV/MPV.pdf for table names.

library(MPV)
## 
## Attaching package: 'MPV'
## The following object is masked from 'package:datasets':
## 
##     stackloss
data(table.b1)

y <- table.b1$y
x1 <- table.b1$x1
x3 <- table.b1$x3
x8 <- table.b1$x8

y.lm <- lm(y ~ x1 + x3 + x8)
summary(y.lm)
## 
## Call:
## lm(formula = y ~ x1 + x3 + x8)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6835 -1.5609 -0.1448  1.7188  4.0386 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 12.136744   9.145612   1.327  0.19698   
## x1           0.001123   0.001938   0.580  0.56762   
## x3           0.159582   0.296318   0.539  0.59516   
## x8          -0.006496   0.002088  -3.112  0.00475 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.42 on 24 degrees of freedom
## Multiple R-squared:  0.5702, Adjusted R-squared:  0.5164 
## F-statistic: 10.61 on 3 and 24 DF,  p-value: 0.0001244



Question-1

Solve the parts (a), (b) and (c) of Problem 4.1. In addition answer the following.

  1. Is it possible to perform lack of fit test using the steps (4.20) to (4.24) ?

Answer:

library("MPV")

fitted = lm(y~x8, data=table.b1)

cat("Part (a). 
    There are some minor deviations but in general it seems OK and we can say no problem with the normality assumptions.")
## Part (a). 
##     There are some minor deviations but in general it seems OK and we can say no problem with the normality assumptions.
rStuRes <- rstudent(fitted)
qqnorm(rStuRes, datax = TRUE, main = "Normal Probability Plot")
qqline(rStuRes, datax = TRUE  )

cat("Part (b). 
    Plot seems in a band and it doesn't suggest any violations of our assumptions.")
## Part (b). 
##     Plot seems in a band and it doesn't suggest any violations of our assumptions.
yHat <- predict(fitted)
plot(yHat, rStuRes)
abline(0, 0) 

cat("Part (c). 
     Plot shows a structure (trend in different parts) so we conclude that adding x2 might contribute to the model.")
## Part (c). 
##      Plot shows a structure (trend in different parts) so we conclude that adding x2 might contribute to the model.
plot(table.b1$x2, rStuRes)
abline(0, 0)

cat("Part (d). 
     The lack of fit test explained in steps (4.20) to (4.24) requires repeated observations, therefore not possible to do in this case.")
## Part (d). 
##      The lack of fit test explained in steps (4.20) to (4.24) requires repeated observations, therefore not possible to do in this case.



Question-2

Chapter 4, Problem 2 all parts.

Answer:

cat("Part (a). 
    There are some deviations from the normality in the first part. First half above and second half seems to be below the normal distribution line (qq-line).")
## Part (a). 
##     There are some deviations from the normality in the first part. First half above and second half seems to be below the normal distribution line (qq-line).
fitted <- lm(y ~ x2 + x7 + x8, data=table.b1)
rStuRes <- rstudent(fitted)
qqnorm(rStuRes, datax = TRUE, main = "Normal Probability Plot")
qqline(rStuRes, datax = TRUE)

cat("Part (b). 
 This also seems in a band with no apparent violations in our assumptions from this plot.")
## Part (b). 
##  This also seems in a band with no apparent violations in our assumptions from this plot.
yhat <- predict(fitted)
plot(yhat, rStuRes)
abline(0, 0)

cat("Part (c).
    The plots of residuals of x7 shows non-constant variance (variance increasing in the first half). Plot against x2 has also some narrowing in the middle and suggests non-constant variance. On the other hand plot against x8 is a little better, although it shows some narrowing towards ends there are too few points (only two points in each end). ")
## Part (c).
##     The plots of residuals of x7 shows non-constant variance (variance increasing in the first half). Plot against x2 has also some narrowing in the middle and suggests non-constant variance. On the other hand plot against x8 is a little better, although it shows some narrowing towards ends there are too few points (only two points in each end).
plot(table.b1$x2, rStuRes)
abline(0, 0)

plot(table.b1$x7, rStuRes)
abline(0, 0)