[R] how to get "lsmeans"?

Wed Mar 21 17:02:41 CET 2007

On Wed, 21 Mar 2007, Chuck Cleland wrote:

> Liaw, Andy wrote:
>> I verified the result from the following with output from JMP 6 on the
>> same data (don't have SAS: don't need it):
>>
>> set.seed(631)
>> n <- 100
>> dat <- data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)),
>>                   B=factor(sample(1:2, n, replace=TRUE)),
>>                   C=factor(sample(1:2, n, replace=TRUE)),
>>                   d=rnorm(n))
>> fm <- lm(y ~ A + B + C + d, dat)
>> ## Form a data frame of points to predict: all combinations of the
>> ## three factors and the mean of the covariate.
>> p <- data.frame(expand.grid(A=1:2, B=1:2, C=1:2))
>> p[] <- lapply(p, factor)
>> p <- cbind(p, d=mean(dat$d))
>> p <- cbind(yhat=predict(fm, p), p)
>> ## lsmeans for the three factors:
>> with(p, tapply(yhat, A, mean))
>> with(p, tapply(yhat, B, mean))
>> with(p, tapply(yhat, C, mean))
>
>  Using Andy's example data, these are the LSMEANS and intervals I get
> from SAS:
>
> A        y LSMEAN      95% Confidence Limits
> 1       -0.071847       -0.387507     0.243813
> 2       -0.029621       -0.342358     0.283117
>
> B        y LSMEAN      95% Confidence Limits
> 1       -0.104859       -0.397935     0.188216
> 2        0.003391       -0.333476     0.340258
>
> C        y LSMEAN      95% Confidence Limits
> 1       -0.084679       -0.392343     0.222986
> 2       -0.016789       -0.336374     0.302795
>
>  One way of reproducing the LSMEANS and intervals from SAS using
> predict() seems to be the following:
>
>> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
>> newdat <- expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
>> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
>  A   B   C          d         fit        lwr       upr
> 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
> 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165
>
>  However, another possibility seems to be:
>
>> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
>> newdat <-
> expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numeric(dat$C)),d=mean(dat$d))
>> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
>  A    B    C          d         fit        lwr       upr
> 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
> 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465
>
>  The predictions directly above match what effect() in the effects
> package by John Fox returns:
>
> library(effects)
>
>> effect("A", fm, xlevels=list(d = mean(dat$D)))
>
> A effect
> A
>          1           2
> -0.08078243 -0.03855619
>
>  But for some reason the predict() and effect() intervals are a little
> different:
>
>> effect("A", fm, xlevels=list(d = mean(dat$D)))$lower
>          [,1]
> 101 -0.3924451
> 102 -0.3440179
>
>> effect("A", fm, xlevels=list(d = mean(dat$D)))$upper
>         [,1]
> 101 0.2308802
> 102 0.2669055
>
>  I would be interested in any comments on these different approaches
> and on the difference in intervals returned by predict() and effect().

AFAIR, the effects packages uses normal-based confidence intervals 
and predict.lm uses t-based ones, and I have suggested to John Fox that 
t-based intervals would be preferable, at least as an option.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595