[R] Huber location estimate

Thu Dec 22 11:12:02 CET 2005

>>>>> "Murray" == Murray Jorgensen <maj at stats.waikato.ac.nz>
>>>>>     on Thu, 22 Dec 2005 22:13:45 +1300 writes:

    Murray> Prof Brian Ripley wrote:
    >> On Thu, 22 Dec 2005, Murray Jorgensen wrote:
    >> 
    >>> We have a choice when calculating the Huber location estimate:
    >>> > set.seed(221205)
    >>> > y <- 7 + 3*rt(30,1)
    >> 
    >> 
    >> That's Cauchy, BTW, a very extreme case.

    Murray> Sure, the sort of situation where one might want a robust estimator.

    Murray> [...]

    >> Note that the huber() scale estimate is the initial MAD, whereas rlm 
    >> iterates.  Because during iteration the 'center' for MAD is known to be 
    >> zero, the results differ.  For symmetric distributions there is little 
    >> difference, but your sample is not close to symmetric.

    Murray> [Blush] yes I knew that and somehow I forgot. But leave rlm() alone for 
    Murray> a while and do IRLS with fixed scale:

    Murray> th <- median(y)
    Murray> s <- mad(y)
    Murray> # paste this in a few times:
    Murray> w <- ifelse((y-th< 1.345*s & y-th>-1.345*s), 1, 1.345*s/abs(y-th))
    Murray> th <- weighted.mean(y,w)
    Murray> th

    Murray> We converge to
    >> th
    Murray> [1] 5.9203
    Murray> close to the answer given by rlm() different from
    >> huber(y)$mu
    Murray> [1] 5.9117

    Murray> So the variable scale does not account for the difference.

No, the main difference is the different default:  
    huber() has  k=1.5
and rlm()   has  k=1.345 :

Try this

set.seed(221205)
y <- 7 + 3*rt(30,1)

str(huber(y, k = 1.345), digits = 5)
## List of 2
##  $ mu: num 5.9203
##  $ s : num 4.0967

str(rlm(y ~ 1)[c("coefficients", "s")], digits = 5) #
## (edited to)
##  $ coefficients: num 5.9204
##  $ s           : num 3.7463

which gives 'mu' very close, even for the iterated
vs. non-iterated scales.

Martin Maechler, ETH Zurich