__NOTE: For your homework download and use the template__ (https://math.dartmouth.edu/~m50f17/HW7.Rmd) __Read the green comments in the rmd file to see where your answers should go.__

#### An example from Regression Diagnostics: Identifying Influential Data and Sources of Collinearity (Belsley, Kuh and Welsch) [,1] sr numeric aggregate personal savings [,2] pop15 numeric % of population under 15 [,3] pop75 numeric % of population over 75 [,4] dpi numeric real per-capita disposable income [,5] ddpi numeric % growth rate of dpi ```{r} data(LifeCycleSavings) lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings) summary(inflm.SR <- influence.measures(lm.SR)) inflm.SR which(apply(inflm.SR$is.inf, 1, any)) rstandard(lm.SR) rstudent(lm.SR) # dfbetas(lm.SR) dffits(lm.SR) covratio(lm.SR) ```

## Question-1 Chapter 6, Problem 15. First check the following page from R project documentation (for various plots to visualize the influence measures): https://cran.r-project.org/web/packages/olsrr/vignettes/influence_measures.html Note: You might need libraries such as olsrr for some of the plots below. (a) Plot : Cook's D chart, DFBETAs Panel, DFFITS Plot and Standardized Residual Chart that are shown in the above link. (b) Find the points with high leverage and Cook's distance. (c) Plot "Studentized Residuals vs Leverage Plot" that you see in the above link. Which regions in this plot corresponds to leverage points, pure leverage and influential regions. Detect the points in each region. (d) What do you think are the most influential points? (You can use the stats shown above or plots in previous parts.) (e) Comment about the normality assumption using probability plot. Remove the most influential points (that you suggested in part-d) and discuss the change/improvements on normality assumption (comparing probability plots). ### Answer: ```{r} ```