- Quiz 2 Solutions
- Introduction to Multiple Linear Regression
- Added-Variable Plots
We start with a response \(Y\) and the simple linear regression mean function
\[ \text{E}(Y | X_1 = x_1 )= \beta_0 + \beta_1 x_1\ \]
Now suppose we have a second variable \(X_2\). Then the mean function becomes
\[ \text{E}(Y | X_1 = x_1 , X_2 = x_2)= \beta_0 + \beta_1 x_1 + \beta_2 x_2 \]
Goal: Adding \(X_2\) attempts to explain the part of \(Y\) not explained by \(X_1\).
The general multiple linear regression model with response \(Y\) and regressors \(X_1, X_2, \ldots X_p\) takes the form
\[ \text{E}( Y | X ) = \beta_0 + \beta_1 X_1 + \ldots + \beta_p X_p \]
When we condition on \(X\), we will collect \(x_1, x_2, \ldots x_p\) into \({\bf x}\) (the predictors) and have
\[ \text{E}(Y | X = {\bf x}) = \beta_0 + \beta_1 x_1 + \ldots + \beta_p x_p \]
You may have previously predicted life expectancy with log(ppgdp)
, but what about if you add fertility
?
Note:= a) \(R^2 = 0.60\), b) \(R^2 = 0.64\)
Q: Why don’t both explain 124% of the variance?
Correlation!
The multiple linear regression model with response \(Y\) and regressors \(X_1, X_2\) takes the form
\[ \text{E}( Y | X ) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]
To get the effect of adding \(X_2\) to the model that already includes \(X_1\), we need to examine the part of \(Y\) not explained by \(X_1\) and the part of \(X_2\) not explained by \(X_1\). We will do the following:
We define the \(F\) statistic as
\[ F = \frac{ \text{explained variance} }{\text{ unexplained variance}}, \]
and is a ratio of two \(\chi^2\) random variables.
Q: How many parameters does the \(F\)-statistic have?
We can also implement this in R
.