Goals for Today

  • Quiz 2 Solutions
  • Introduction to Multiple Linear Regression
  • Added-Variable Plots

Ch. 3 Adding a Regressor

We start with a response \(Y\) and the simple linear regression mean function

\[ \text{E}(Y | X_1 = x_1 )= \beta_0 + \beta_1 x_1\ \]

Now suppose we have a second variable \(X_2\). Then the mean function becomes

\[ \text{E}(Y | X_1 = x_1 , X_2 = x_2)= \beta_0 + \beta_1 x_1 + \beta_2 x_2 \]

Goal: Adding \(X_2\) attempts to explain the part of \(Y\) not explained by \(X_1\).

Multiple Linear Regression

The general multiple linear regression model with response \(Y\) and regressors \(X_1, X_2, \ldots X_p\) takes the form

\[ \text{E}( Y | X ) = \beta_0 + \beta_1 X_1 + \ldots + \beta_p X_p \]

When we condition on \(X\), we will collect \(x_1, x_2, \ldots x_p\) into \({\bf x}\) (the predictors) and have

\[ \text{E}(Y | X = {\bf x}) = \beta_0 + \beta_1 x_1 + \ldots + \beta_p x_p \]

United Nations Data

You may have previously predicted life expectancy with log(ppgdp), but what about if you add fertility?

United Nations Data

Note:= a) \(R^2 = 0.60\), b) \(R^2 = 0.64\)

Q: Why don’t both explain 124% of the variance?

Correlation!

Predictors vs Regressors

  • The Intercept
  • Predictors
  • Polynomials
  • Interaction terms
  • Dummy variables and factors
  • Regression splines
  • Principal components

Multiple Linear Regression

The multiple linear regression model with response \(Y\) and regressors \(X_1, X_2\) takes the form

\[ \text{E}( Y | X ) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]

3.1.2 Added-Variable Plots

To get the effect of adding \(X_2\) to the model that already includes \(X_1\), we need to examine the part of \(Y\) not explained by \(X_1\) and the part of \(X_2\) not explained by \(X_1\). We will do the following:

  1. Compute the regression of response \(Y\) on \(X_1\). Keep the residuals from this regression. This is the part of \(Y\) not explained by \(X_1\).
  2. Compute the regression of response \(X_2\) on \(X_1\). Keep the residuals. This is the part of \(X_2\) not explained by \(X_1\).
  3. The added-variable plot is the unexplained part of the response from (1) on unexplained part of (2).

\(F\) - Statistic

We define the \(F\) statistic as

\[ F = \frac{ \text{explained variance} }{\text{ unexplained variance}}, \]

and is a ratio of two \(\chi^2\) random variables.

Q: How many parameters does the \(F\)-statistic have?

We can also implement this in R.