We discussed
- Properties of variance
- Variance of \(\beta\)s
- Estimated variance
We discussed
Since we will not necessarily know the population variance \(\sigma\), we substitute this our estimate \(\hat{\sigma}\), yielding
\[ \widehat{\text{Var}} \left( \hat{\beta}_1 | X \right) = \hat{\sigma}^2 \frac{1}{\texttt{SXX}}, \qquad \widehat{\text{Var}} \left( \hat{\beta}_0 | X \right) = \hat{\sigma}^2 \left( \frac{1}{n} + \frac{\bar{x}^2}{\texttt{SXX}}\right) \]
and the standard error, se, is then \[ \text{se}\left( \hat{\beta}_0 | X \right) = \sqrt{ \widehat{\text{Var}} \left( \hat{\beta}_0 | X \right) }, \]
Q: How do you create a \((1-\alpha) \times 100 \%\) confidence interval?
For a \(t\)-distribution, we will use \(t (\alpha/2 , df ) = t(\alpha/2, n-2)\). Hence, we have \[ \hat{\beta}_0 - t\left(\tfrac{\alpha}{2}, n-2\right) \text{se}\left(\hat{\beta}_0 | X \right) \le \beta_0 \le \hat{\beta}_0 + t\left(\tfrac{\alpha}{2}, n-2\right) \text{se}\left(\hat{\beta}_0 | X \right) \]
This will be the same for the slope!
For the hypothesis test,
\[ \begin{align*} H_0: \quad \beta_0 &= \beta_0^*,\; \beta_1 \text{ arbitrary}\\ H_a: \quad \beta_0 &\neq \beta_0^*,\; \beta_1 \text{ arbitrary}, \end{align*} \]
Calculating the \(t\)-statistic will be similar as before,
\[ t = \frac{\hat{\beta}_0 - \beta_0^*}{\text{se}\left(\hat{\beta}_0 | X \right)}. \]
Q: How would you set up (and interpret) the hypothesis for the slope?
Q: When can you use your OLS model to predict future data?
Assume we have new data (not seen in constructing the model) \((x_*, y_*)\). Then, a point prediction (with our model) for \(y_*\) would be
\[ \tilde{y}_* = \hat{\beta}_0 + \hat{\beta}_1 x_* \]
\(\tilde{y}_*\) predicts the as yet unobserved \({y}_*\). Assuming the model is correct, then the true value of \({y}_*\)
\[ {y}_* = \beta_0 + \beta_1 x_* + e_*, \]
where \(e_*\) is the random error associated with \({y}_*\).
For prediction intervals, the standard error is given by \[ \text{sepred}(y_* | x_*) = \sigma \left( 1 + \frac{1}{n} + \frac{ (x_* - \bar{x} )^2}{\texttt{SXX}} \right)^{1/2}. \]
For confidence intervals, which describes \(E(Y|X = x_*)\), the standard error is \[ \text{sefit}(y_* | x_*) = \sigma \left(\frac{1}{n} + \frac{ (x_* - \bar{x} )^2}{\texttt{SXX}} \right)^{1/2}. \] Q: What do you think accounts for this difference?
Article on confidence and prediction intervals
Read Section 2.7 and 2.8 and come with questions to discuss for Friday.
Reminder: Quiz on Friday!