**NOTE: For your homework download and use the template** (https://math.dartmouth.edu/~m50f17/HW6.Rmd)

**Read the green comments in the rmd file to see where your answers should go.**

```
windmill <- read.table("https://math.dartmouth.edu/~m50f17/windmill.csv", header=T)
plot(windmill$velocity, windmill$DC, xlab = "wind velocity", ylab = "DC current")
fit <- lm(DC~velocity, data = windmill)
abline(fit$coefficients, col="red")
```

`summary(fit)`

```
##
## Call:
## lm(formula = DC ~ velocity, data = windmill)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59869 -0.14099 0.06059 0.17262 0.32184
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.13088 0.12599 1.039 0.31
## velocity 0.24115 0.01905 12.659 7.55e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2361 on 23 degrees of freedom
## Multiple R-squared: 0.8745, Adjusted R-squared: 0.869
## F-statistic: 160.3 on 1 and 23 DF, p-value: 7.546e-12
```

```
plot(fitted.values(fit), rstudent(fit), xlab = "y", ylab = "R-Student residuals", main = "Windmill - Residual Plot")
abline(c(0,0), col="red")
```

```
fit2 <- lm(DC~poly(velocity, degree = 2), data = windmill)
summary(fit2)
```

```
##
## Call:
## lm(formula = DC ~ poly(velocity, degree = 2), data = windmill)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.26347 -0.02537 0.01264 0.03908 0.19903
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.60960 0.02453 65.605 < 2e-16 ***
## poly(velocity, degree = 2)1 2.98825 0.12267 24.359 < 2e-16 ***
## poly(velocity, degree = 2)2 -0.97493 0.12267 -7.947 6.59e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1227 on 22 degrees of freedom
## Multiple R-squared: 0.9676, Adjusted R-squared: 0.9646
## F-statistic: 328.3 on 2 and 22 DF, p-value: < 2.2e-16
```

```
plot(windmill$velocity, windmill$DC, xlab = "wind velocity", ylab = "DC current")
lines(sort(windmill$velocity), fitted(fit2)[order(windmill$velocity)], col='red')
```

```
velRep = 1/windmill$velocity
DC <- windmill$DC
plot(velRep, windmill$DC, xlab = "1/velocity", ylab = "DC current")
fit3 <- lm(DC~velRep)
abline(fit3$coefficients, col="red")
```

`summary(fit3)`

```
##
## Call:
## lm(formula = DC ~ velRep)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.20547 -0.04940 0.01100 0.08352 0.12204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.9789 0.0449 66.34 <2e-16 ***
## velRep -6.9345 0.2064 -33.59 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09417 on 23 degrees of freedom
## Multiple R-squared: 0.98, Adjusted R-squared: 0.9792
## F-statistic: 1128 on 1 and 23 DF, p-value: < 2.2e-16
```

```
plot(fitted.values(fit), rstudent(fit3), xlab = "fitted values", ylab = "Studentized residuals", main = "Residuals - reciprocal model")
abline(c(0,0), col="red")
```

Recall the phytoplankton population data is given at : https://math.dartmouth.edu/~m50f17/phytoplankton.csv

where headers are

- pop : population of phytoplankton (\(y\))
- subs2 : concentration of substance-2 (\(x\))

Plot the scatter diagram for pop ~ subs2. Do you think a straight line model is adequate? Fit a straight line model and support your argument with summary statistics.

Do you suggest to use Box-Cox method? If not explain, if so apply the method and demonstrate the improvement.

An analyst suggests to use the following model \[ y = \beta_0 + \beta_1 (x-4.5)^2 \] Using transformations, fit a simple linear regression model. Plot the scatter diagram and fitted curve (Note: it is not a straight line in this case). Compare \(MS_{res}\), \(R^2\) and the R-student residual plots with the model in part a.

Construct the probability plot for part (c). Is there a problem with the normality assumption? If so determine the problem (heavy tailed, light tailed, or something else)

```
pData <- read.table("https://math.dartmouth.edu/~m50f17/phytoplankton.csv", header=T, sep=",")
pop <- pData$pop
subs1 <- pData$subs1
subs2 <- pData$subs2
fitted=lm(pop~subs2)
plot (subs2, pop)
abline(fitted$coefficients)
```