[R] what to do if residuals produced by lm() have long tails?

Fri Nov 30 07:54:47 CET 2007

On Thu, 29 Nov 2007, tom soyer wrote:

> Hi,
>
> I am using lm() for regression analysis of my data set. My regression
> results look pretty good, i.e., the coefficient is significant and the p
> value is much less than 0.05. But when I checked the residuals, both using
> qqnorm() and hist(), the distribution does not look normal. It  looks like
> the residuals have long tails. I assume that lm() uses OLS, and since one of
> the assumptions of OLS is that the residuals has to be normally distributed,
> I am wondering if this means I should reject my regression results all
> together. If so, then what should I use instead? Are there ways to deal with
> distributions with long tails using lm() or OLS, or entirely different
> models are needed instead?

The main point is that least squares is rather inefficient with 
long-tailed error distributions.  Robust methods are designed to be 
efficient for a wide class of long-tailed distributions, and so are 
preferable.  Use e.g. rlm (package MASS) or lmRob (package robust) in 
place of lm.  If this makes a different to your 'regression results', then 
yes, you need to reject the least-squares results.

This is discussed in good texts on doing statistics with R, e.g. MASS.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595