[R] Testing for normality of residuals in a regression model
Liaw, Andy
andy_liaw at merck.com
Fri Oct 15 18:55:03 CEST 2004
Let's see if I can get my stat 101 straight:
We learned that linear regression has a set of assumptions:
1. Linearity of the relationship between X and y.
2. Independence of errors.
3. Homoscedasticity (equal error variance).
4. Normality of errors.
Now, we should ask: Why are they needed? Can we get away with less? What
if some of them are not met?
It should be clear why we need #1.
Without #2, I believe the least squares estimator is still unbias, but the
usual estimate of SEs for the coefficients are wrong, so the t-tests are
wrong.
Without #3, the coefficients are, again, still unbiased, but not as
efficient as can be. Interval estimates for the prediction will surely be
wrong.
Without #4, well, it depends. If the residual DF is sufficiently large, the
t-tests are still valid because of CLT. You do need normality if you have
small residual DF.
The problem with normality tests, I believe, is that they usually have
fairly low power at small sample sizes, so that doesn't quite help. There's
no free lunch: A normality test with good power will usually have good
power against a fairly narrow class of alternatives, and almost no power
against others (directional test). How do you decide what to use?
Has anyone seen a data set where the normality test on the residuals is
crucial in coming up with appriate analysis?
Cheers,
Andy
> From: Federico Gherardini
>
> Berton Gunter wrote:
>
> >>>Exactly! My point is that normality tests are useless for
> this purpose for
> >>>reasons that are beyond what I can take up here.
> >>>
> Thanks for your suggestions, I undesrtand that! Could you
> possibly give
> me some (not too complicated!)
> links so that I can investigate this matter further?
>
> Cheers,
>
> Federico
>
> >>>Hints: Balanced designs are
> >>>robust to non-normality; independence (especially
> "clustering" of subjects
> >>>due to systematic effects), not normality is usually the
> biggest real
> >>>statistical problem; hypothesis tests will always reject
> when samples are
> >>>large -- so what!; "trust" refers to prediction validity
> which has to do
> >>>with study design and the validity/representativeness of
> the current data to
> >>>future.
> >>>
> >>>I know that all the stats 101 tests say to test for
> normality, but they're
> >>>full of baloney!
> >>>
> >>>Of course, this is "free" advice -- so caveat emptor!
> >>>
> >>>Cheers,
> >>>Bert
> >>>
> >>>
> >>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list