Solutions for Homework sheet3

Problem 1

Part (a)

In [1]:
# read the data 
from read_data_fx import read_tb_data
# Age is X
# Muscle mass is Y

# import the packages required 
import numpy as np
import scipy as sc
import statsmodels.api as sm
import matplotlib.pyplot as plt 

# Carry out the linear regression 
barX=np.mean(X); barY=np.mean(Y)
XminusbarX=X-barX; YminusbarY=Y-barY
e_i=Y-Yhat # Residuals 
n=len(X); SSE=np.sum(e_i**2)

Test hypotheses/alternatives are:

$H_0 : \beta_1 \geq 0$ there is no negative relationship between amount of muscle mass and age.

$H_a : \beta_1 \lt 0$ there is a negative relationship between amount of muscle mass and age.

Test statistics is given by


Where $s\{b_1\}=\sqrt{ \frac{MSE}{\sum{ {(X_i-\overline{X})}^2 } }}$

Decision rules are:

If $t^* \geq t(\alpha;n-2)$ then conclude $H_0$

If $t^* \lt t(\alpha;n-2)$ then conclude $H_a$

In [2]:
from scipy import stats
# calculating t value
In [3]:
print 'tstar=',tstar, 'and', 't=', t1 
tstar= -13.1932568116 and t= -1.67155276215

We observe above that $t^* \lt t(1-\alpha;n-2)$, hence we conclude $H_a$

The P-value = $P\{t(n-2) \lt t^*\}$

In [4]:
pval=stats.t.cdf(tstar,n-2);  print ' P-value = %4.4f' % pval 
 P-value = 0.0000

We observe P-value $=0.000$, we can conclude $H_a$ from this as well because P-value $< \alpha$

Part (b)

This conclusion in not appropriate for the following reasons:

  1. $\beta_0$ corresponds to age 0 which is way outside the range of the predictor variable (40-79). Hence any conclusion for age 0 based on $\beta_0$ would not be very accurate.

  2. The slope of the fitted line is negative, which will mean that $\beta_0$ would be large positive number (its estimate is $b_0= 156.3$). In other words female infants will have substantially more muscle mass than adult women. Which is physically unrealistic

Part (c)

In [5]:
ll=b1-ttsb; ul=b1+ttsb
print '95 percent confidence interval for beta_1'
print '%4.4f <= beta_1'% ll, '<= %4.4f' % ul
95 percent confidence interval for beta_1
-1.3705 <= beta_1 <= -1.0094

Problem 2

Here we are given that P-value is $0.91$. Two sided P-value=$2P\{t(n-2) \gt t^*\}$, it is the probability of obtaining $\beta_1$ equal to or more extreme than what was estimated, assuming that the null hypothesis is true. Which implies that the estimated slope of $-.18$ in this example is not statistically significant i.e., we will fail to reject the null hypothesis $H_0:\beta_1=0$. Hence, we can not make any worthwhile conclusion from the estimated parameters. Therefor the students message "The message I get here is that the more we spend on advertising this product, the fewer units we sell!" is wrong.

Problem 3

For normal error regression model, the sampling distribution of $b_1$ is normal with mean and variance:



We know that $b_1$ can be written as linear combination of $Y_i$ i.e.,


Where $k_i=\frac{X_i-\overline{X}}{\sum{(X_i-\overline{X})^2}}$.


$\sum{k_i}=0 \tag{1}$

$\sum{k_iX_i}=1 \tag{2}$

$\sum{k_i^2}=\frac{1}{\sum{(X_i-\overline{X})^2} } \tag{3}$

  • Derivation for $E\{b_1\}=\beta_1$:

$E\{b_1\}=E\{\sum{k_iY_i}\}=\sum{k_i E\{Y_i\}}=\sum{k_i(\beta_0+\beta_1X_i)}=\beta_0\sum{k_i}+\beta_1\sum{k_i}X_i \tag{4}$

Using (2) and (3) in the R.H.S of (4) we will get $E\{b_1\}=\beta_1$

  • Derivation for $\sigma^2\{b_1\}=\frac{\sigma^2}{\sum{(X_i-\overline{X})^2}}$:

$\sigma^2\{b_1\}=\sigma^2\{\sum{k_iY_i}\}= \sum{k_i^2 \sigma^2 \{Y_i\} }= \sigma^2\sum{k_i^2} \tag{5} $

Substituting (3) in R.H.S of (5) we will get $\sigma^2\{b_1\}=\frac{\sigma^2}{\sum{(X_i-\overline{X})^2}}$

  • Derivation for $s^2\{b_1\}$ (estimated variance of the sampling distribution of $b_1$):

We know $\sigma^2\{b_1\}=\frac{\sigma^2}{\sum{(X_i-\overline{X})^2}}$, to obtain $s^2\{b_1\}$ we replace $\sigma^2$ with MSE the unbiased estimator of $\sigma^2$. Therefore, we get

$s^2\{b_1\} =\frac{MSE}{\sum{(X_i-\overline{X})^2}}$