MATH 106 - Applied Linear Statistical Models

mbanuelos22@csufresno.edu

Goals for Today

Quiz 2 Solutions
Introduction to Multiple Linear Regression
Added-Variable Plots

Ch. 3 Adding a Regressor

We start with a response \(Y\) and the simple linear regression mean function

\[ \text{E}(Y | X_1 = x_1 )= \beta_0 + \beta_1 x_1\ \]

Now suppose we have a second variable \(X_2\). Then the mean function becomes

\[ \text{E}(Y | X_1 = x_1 , X_2 = x_2)= \beta_0 + \beta_1 x_1 + \beta_2 x_2 \]

Goal: Adding \(X_2\) attempts to explain the part of \(Y\) not explained by \(X_1\).

Multiple Linear Regression

The general multiple linear regression model with response \(Y\) and regressors \(X_1, X_2, \ldots X_p\) takes the form

\[ \text{E}( Y | X ) = \beta_0 + \beta_1 X_1 + \ldots + \beta_p X_p \]

When we condition on \(X\), we will collect \(x_1, x_2, \ldots x_p\) into \({\bf x}\) (the predictors) and have

\[ \text{E}(Y | X = {\bf x}) = \beta_0 + \beta_1 x_1 + \ldots + \beta_p x_p \]

United Nations Data

You may have previously predicted life expectancy with log(ppgdp), but what about if you add fertility?

United Nations Data

Note:= a) \(R^2 = 0.60\), b) \(R^2 = 0.64\)

Q: Why don’t both explain 124% of the variance?

Correlation!

Predictors vs Regressors

The Intercept
Predictors
Polynomials
Interaction terms
Dummy variables and factors
Regression splines
Principal components

Multiple Linear Regression

The multiple linear regression model with response \(Y\) and regressors \(X_1, X_2\) takes the form

\[ \text{E}( Y | X ) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]

3.1.2 Added-Variable Plots

To get the effect of adding \(X_2\) to the model that already includes \(X_1\), we need to examine the part of \(Y\) not explained by \(X_1\) and the part of \(X_2\) not explained by \(X_1\). We will do the following:

Compute the regression of response \(Y\) on \(X_1\). Keep the residuals from this regression. This is the part of \(Y\) not explained by \(X_1\).
Compute the regression of response \(X_2\) on \(X_1\). Keep the residuals. This is the part of \(X_2\) not explained by \(X_1\).
The added-variable plot is the unexplained part of the response from (1) on unexplained part of (2).

\(F\) - Statistic

We define the \(F\) statistic as

\[ F = \frac{ \text{explained variance} }{\text{ unexplained variance}}, \]

and is a ratio of two \(\chi^2\) random variables.

Q: How many parameters does the \(F\)-statistic have?

We can also implement this in R.