[R] Huber location estimate

Thu Dec 22 09:42:58 CET 2005

On Thu, 22 Dec 2005, Murray Jorgensen wrote:

> We have a choice when calculating the Huber location estimate:
> > set.seed(221205)
> > y <- 7 + 3*rt(30,1)

That's Cauchy, BTW, a very extreme case.

> > library(MASS)
> > huber(y)$mu
> [1] 5.9117
> > coefficients(rlm(y~1))
> (Intercept)
>      5.9204
>
> I was surprised to get two different results. The function huber() works
>  directly with the definition whereas rlm() uses iteratively reweighted
> least squares.
>
> My surprise is because I vaguely remember
>
> @ARTICLE{hw77,
>   author  = {Holland, P. W. and Welsch, R. E.},
>   title   = {Robust Regression using Iteratively Reweighted Least-Squares},
>   journal = {Communications in Statistics: Theory and Methods},
>   volume  = {A6(9)},
>   number  = {},
>   pages   = {813-827},
>   year    = {1977}
> }
> as saying that the two methods were equivalent. Obviously they aren't
> quite. Comments welcome.

Scale estimation differs.  You have (unfairly to the uncredited author) 
not included all the output:

> huber(y)
$mu
[1] 5.911719

$s
[1] 4.096697

> rlm(y~1)
Call:
rlm(formula = y ~ 1)
Converged in 5 iterations

Coefficients:
(Intercept)
    5.920354

Degrees of freedom: 30 total; 29 residual
Scale estimate: 3.75

Note that the huber() scale estimate is the initial MAD, whereas rlm 
iterates.  Because during iteration the 'center' for MAD is known to be 
zero, the results differ.  For symmetric distributions there is little 
difference, but your sample is not close to symmetric.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595