# Confidence band for Regression Line¶

Confidence band for a regression line is given by:

$$\hat{Y_h} \pm Ws\{\hat{Y_h}\}$$

where $W^2=2F(1-\alpha;2;n-2)$.

In [1]:
# import the packages required
import numpy as np
import scipy as sc
import statsmodels.api as sm
import matplotlib.pyplot as plt
# import the data set using statsmodel.api
cars_speed=sm.datasets.get_rdataset("cars", "datasets")
X=np.array(cars_speed.data['speed'])
Y=np.array(cars_speed.data['dist'])
barX=np.mean(X); barY=np.mean(Y)
XminusbarX=X-barX; YminusbarY=Y-barY
b1=sum(XminusbarX*YminusbarY)/sum(XminusbarX**2)
b0=barY-b1*barX
Yhat=b0+b1*X
e_i=Y-Yhat # Residuals
sum_of_residulas = np.sum(e_i)
sum_of_squares_of_residuals = np.sum(e_i**2)
n=len(X); SSE=sum_of_squares_of_residuals; MSE=sum_of_squares_of_residuals/(n-2); s=np.sqrt(MSE)


We know $s\{\hat{Y_h}\}=\sqrt{MSE \Big[\frac{1}{n}+\frac{(X_h-\overline{X})^2}{\sum{ {(X_i-\overline{X})}^2 } } \Big]}$

In [2]:
s_of_yh_hat=np.sqrt(MSE*(1.0/n+(X-barX)**2/sum(XminusbarX**2)))

In [3]:
from scipy import stats
W=np.sqrt(2.0*stats.f.ppf(0.95,2,n-2))
cb_upper=Yhat+W*s_of_yh_hat
cb_lower=Yhat-W*s_of_yh_hat

In [4]:
%matplotlib inline
plt.figure(figsize=(20,10))
#scatter plot
plt.scatter(cars_speed.data['speed'],cars_speed.data['dist'],c='b',s=160)
# xlable of the scatter plot
plt.xlabel('Speed',fontsize=20)
# ylabel of the scatter plot
plt.ylabel('Distance to stop',fontsize=20)
# title of the scatter plot
plt.title('Distance cars took to stop in 1920s',fontsize=20)
# regression line
plt.plot(X,Yhat,'r-',linewidth=2)

# x-axis limits
plt.xlim(3,26)

# y-axis limits
plt.ylim(-20,125)

plt.plot(X,cb_lower,'k--',linewidth=3)
plt.plot(X,cb_upper,'k--',linewidth=3);