- Scatterplots
- Residuals, Mean, and Variance Functions
- Introduction to R
Another part of the distribution of \(Y\) is described by the variance function ,
\[ \text{Var}(Y \;|\; X = x) \]
A frequent assumption in fitting linear regression models is that the variance function is the same for every value of x.
\[ \text{Var}(Y \;|\; X = x) = \sigma^2 \]
This is usually done for convenience but we will discuss general variance models in Ch. 7.
A summary graph is a scatterplot of \(Y\) versus \(X\).
Q: Why should you take time to explore these graphs?
anscombe
.Q: How do we determine outliers?
Q:
This data is from a cross-sectional study, as a opposed to a longitudinal study, where one would keep track of the age and length of the same fish over time.
For uncorrelated, variables that do not show a positive or negative association, data, we will need to conduct appropriate statistical tests to check for difference.
Let’s look at the fuel2001
data in R
,
myfuel <- fuel2001 # generate a summary of all columns in data summary(myfuel)
## Drivers FuelC Income Miles ## Min. : 328094 Min. : 148769 Min. :20993 Min. : 1534 ## 1st Qu.: 1087128 1st Qu.: 737361 1st Qu.:25323 1st Qu.: 36586 ## Median : 2718209 Median : 2048664 Median :27871 Median : 78914 ## Mean : 3750504 Mean : 2542786 Mean :28404 Mean : 77419 ## 3rd Qu.: 4424256 3rd Qu.: 3039932 3rd Qu.:31208 3rd Qu.:112828 ## Max. :21623793 Max. :14691753 Max. :40640 Max. :300767 ## MPC Pop Tax ## Min. : 6556 Min. : 381882 Min. : 7.50 ## 1st Qu.: 9391 1st Qu.: 1162624 1st Qu.:18.00 ## Median :10458 Median : 3115130 Median :20.00 ## Mean :10448 Mean : 4257046 Mean :20.15 ## 3rd Qu.:11311 3rd Qu.: 4845200 3rd Qu.:23.25 ## Max. :17495 Max. :25599275 Max. :29.00
** Q: Why should Fuel
and Dlic
be included in this data?
# create a scatterplot matrix of fuel data plot(myfuel)
Q: What is the expected value of rolling a die?
The expected value is a linear operator, which means
\[
\begin{align*}
\text{E}(a_0 +a_1u_1) &= a_0 +a_1 \text{E}(u_1)\\
\text{E}\left( a_0 +\sum a_i u_i\right) &= a_0 +\sum a_i \text{E}(u_i),
\end{align*}
\] where \(a_0, a_1\) are constants and \(u_i\) are random variables.
Class Activity: Using this information, show that the expected value of the sample mean is equal to the population mean.
The variance is defined by the equation \[ \text{Var}(u_i) = \text{E}[u_i - \text{E}(u_i)]^2,\] the expected squared difference between an observed value for \(u_i\) and its mean value. For uncorrelated random variables, \[ \text{Var}\left(a_0 + \sum a_i u_i \right)= \sum a_i^2 \text{Var} (u_i). \]
Note: The variance of a constant is zero.