MATH 106 - Applied Linear Statistical Models

Simple Linear Regression

The simple linear regression model consists of the mean and variance function, \[ \begin{align*} \text{E} (Y \; | \; X = x) &= \beta_0 + \beta_1 x\\ \text{Var} ( Y \; | \; X = x) &= \sigma^2 \end{align*} \]
Parameters are unknown quantities that characterize a model.
Estimates of parameters are computable functions of data and are therefore statistics.

Least Squares Estimates

Recall that the least squares estimates for the RSS are \[ \begin{align*} \hat{\beta}_1 &= \frac{\texttt{SXY}}{\texttt{SXX}} = r_{xy} \frac{SD_y}{SD_x} = r_{xy} \left(\frac{\texttt{SYY}}{\texttt{SXX}}\right)^{1/2}\\ \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x}. \end{align*} \] In R, we will use lm( response ~ predictor ) to estimate \(\beta_0\) and \(\beta_1\).

Estimating the Variance

\(\sigma^2\) is approximately the average squared size of the \(e_i^2\).
To estimate the variance \(\sigma^2\), we divide the RSS by the degrees of freedom, \(df\) (number of cases/observations minus number of parameters).
For simple linear regression, we have \(df = n - 2\), and \[ \hat{\sigma}^2 = \frac{\texttt{RSS}}{n-2}, \] known as the residual mean square.

The Standard Error of Regression

If we assume \[ Y = \beta_0 + \beta_1 X + \epsilon, \] where \(\epsilon\) – the error term – has mean zero, then the regression standard error is \[ \sigma, \] and it is the same units as the response variable.

Q: What part of the output of lm tells you this information?

Variance (continued)

If we assume \(e_i\) are drawn from a normal distribution, then \[ \hat{\sigma}^2 \sim \frac{\sigma^2}{n-2} \chi^2 (n - 2), \]

Q: What is the mean of a \(\chi^2\) random variable with \(n\) degrees of freedom?

Then, \[ \text{E}( \hat{\sigma}^2 \; | \; X ) = \frac{\sigma^2}{n-2} \text{E} \left[\chi^2 (n - 2)\right] = \frac{\sigma^2}{n-2} (n-2) = \sigma^2 \]